Shed Skin Python-to-C++ Compiler 0.0.21, Help needed

Mark Dufour

unread,

Mar 31, 2007, 5:55:22 AM3/31/07

to pytho...@python.org

Hi all,

I have recently released version 0.0.20 and 0.0.21 of Shed Skin, an
optimizing Python-to-C++ compiler. Shed Skin allows for translation of
pure (unmodified), implicitly statically typed Python programs into
optimized C++, and hence, highly optimized machine language. Besides
many bug fixes and optimizations, these releases add the following
changes:

-support for 'bisect', 'collections.deque' and 'string.maketrans'
-improved 'copy' support
-support for 'try, else' construction
-improved error checking for dynamic types
-printing of floats is now much closer to CPython

For more details about Shed Skin and a collection of 27 programs, at a
total of about 7,000 lines, that it can compile (resulting in an
average speedup of about 39 times over CPython and 11 times over Psyco
on my computer), please visit the homepage at:

http://mark.dufour.googlepages.com

I could really use more help it pushing Shed Skin further. Simple ways
to help out, but that can save me lots of time, are to find smallish
code fragments that Shed Skin currently breaks on, and to help
improve/optimize the (C++) builtins and core libraries. I'm also
hoping someone else would like to deal with integration with CPython
(so Shed Skin can generate extension modules, and it becomes easier to
use 'arbitrary' external CPython modules such as 're' and 'pygame'.)
Finally, there may be some interesting Master's thesis subjects in
improving Shed Skin, such as transforming heap allocation into stack-
and static preallocation, where possible, to bring performance even
closer to manual C++. Please let me know if you are interested in
helping out, and/or join the Shed Skin mailing list.

Thanks!
Mark Dufour.
--
"One of my most productive days was throwing away 1000 lines of code"
- Ken Thompson

Bjoern Schliessmann

unread,

Mar 31, 2007, 7:38:19 AM3/31/07

to

Mark Dufour wrote:

> Shed Skin allows for translation of pure (unmodified), implicitly
> statically typed Python programs into optimized C++, and hence,

^^^^^
> highly optimized machine language.
^^^^^^^^^^^^^^^^

Wow, I bet all C++ compiler manufacturers would want you to work for
them.

Regards,

Björn

--
BOFH excuse #23:

improperly oriented keyboard

sk...@pobox.com

unread,

Mar 31, 2007, 10:27:19 AM3/31/07

to Bjoern Schliessmann, pytho...@python.org

Björn> Mark Dufour wrote:
>> Shed Skin allows for translation of pure (unmodified), implicitly
>> statically typed Python programs into optimized C++, and hence,

>> highly optimized machine language.
Bjoern> ^^^^^^^^^^^^^^^^

Bjoern> Wow, I bet all C++ compiler manufacturers would want you to work
Bjoern> for them.

Why are you taking potshots at Mark? He's maybe onto something and he's
asking for help. If he can generate efficient C++ code from implicitly
statically type Python it stands to reason that he can take advantage of the
compiler's optimization facilities.

Skip

Bjoern Schliessmann

unread,

Mar 31, 2007, 5:14:49 PM3/31/07

to

sk...@pobox.com wrote:

> Why are you taking potshots at Mark?

What suggests that I'm "taking potshots" at Mark?

> He's maybe onto something and he's asking for help. If he can
> generate efficient C++ code from implicitly statically type Python
> it stands to reason that he can take advantage of the compiler's
> optimization facilities.

Yes, compilers do output optimized machine code. But generally
calling that code "highly optimized" is, IMHO, exaggeration.

Regards,

Björn

--
BOFH excuse #426:

internet is needed to catch the etherbunny

Luis M. González

unread,

Mar 31, 2007, 5:26:13 PM3/31/07

to

On Mar 31, 8:38 am, Bjoern Schliessmann <usenet-

Mark has been doing an heroic job so far.
Shedskin is an impressive piece of software and, if pypy hadn't been
started some time ago, it should have gotten more attention from the
community.
I think he should be taken very seriously.

He is the first programmer I know who actually released working
code(and a lot of it) of a project that actually manages to speed up
python by a large margin, by means of advanced type inference
techniques.
Other people, in the past, have attended conferences and made
spectacular announcements of projects that could speed up python by
60x or more, but never ever released any code.

Mark has been working quietly for a long time, and his works deserves
a lot of credit (and hopefully, some help).

Alexander Schmolck

unread,

Mar 31, 2007, 7:45:16 PM3/31/07

to

"Luis M. González" <lui...@gmail.com> writes:

> On Mar 31, 8:38 am, Bjoern Schliessmann <usenet-
> mail-0306.20.chr0n...@spamgourmet.com> wrote:
> > Mark Dufour wrote:
> > > Shed Skin allows for translation of pure (unmodified), implicitly
> > > statically typed Python programs into optimized C++, and hence,
> >
> > ^^^^^> highly optimized machine language.
> >
> > ^^^^^^^^^^^^^^^^
> >
> > Wow, I bet all C++ compiler manufacturers would want you to work for
> > them.
> >
> > Regards,
> >
> > Björn
> >
> > --
> > BOFH excuse #23:
> >
> > improperly oriented keyboard
>
>
> Mark has been doing an heroic job so far.
> Shedskin is an impressive piece of software and, if pypy hadn't been
> started some time ago, it should have gotten more attention from the
> community.

Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
for many in the community.

'as

Paul Boddie

unread,

Mar 31, 2007, 8:34:56 PM3/31/07

to

Alexander Schmolck wrote:
>
> Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
> for many in the community.

In the way that tools such as gcc are GPL-licensed, or do you have
something else in mind?

Paul

Paul McGuire

unread,

Mar 31, 2007, 8:35:15 PM3/31/07

to

On Mar 31, 6:45 pm, Alexander Schmolck <a.schmo...@gmail.com> wrote:
> Regardless of its merrits, it's GPL'ed which I assume is an immediate turn-off
> for many in the community.
>

Why would that be? GPL'ed code libraries can be a turn-off for those
who want to release commercial products using them, but a GPL'ed
utility such as a compiler bears no relationship or encumbrance on the
compiled object code it generates.

-- Paul

Paul Rubin

unread,

Mar 31, 2007, 9:24:31 PM3/31/07

to

"Paul McGuire" <pt...@austin.rr.com> writes:
> Why would that be? GPL'ed code libraries can be a turn-off for those
> who want to release commercial products using them, but a GPL'ed
> utility such as a compiler bears no relationship or encumbrance on the
> compiled object code it generates.

For some of us, doing volunteer work on non-GPL projects is a
turn-off. I don't mind writing code that goes into proprietary
products, but I expect to get paid for it just like the vendors of the
products expect to get paid. If I'm working for free I expect the
code to stay free. This is why I don't contribute code to Python on
any scale.

Bjoern Schliessmann

unread,

Mar 31, 2007, 9:31:31 PM3/31/07

to

Luis M. González wrote:

> I think he should be taken very seriously.

Agreed.

Okay, it seems focusing a discussion on one single point is
difficult for many people. Next time I'll be mind-bogglingly clear
that even the last one understands after reading it one time ...

Regards,

Björn

Fup2 p

--
BOFH excuse #46:

waste water tank overflowed onto computer

Luis M. González

unread,

Mar 31, 2007, 9:49:11 PM3/31/07

to

On Mar 31, 10:31 pm, Bjoern Schliessmann <usenet-

Bjoern,

I understood what you said. It's just that it seemed that you were
mocking at the poster's message.
I apologize if that wasn't your intention.

Luis

Message has been deleted

Michael Torrie

unread,

Mar 31, 2007, 11:40:51 PM3/31/07

to pytho...@python.org

On Sun, 2007-04-01 at 02:49 +0000, Dennis Lee Bieber wrote:
> Take that up with ACT... GNAT 3.15p was explicitly unencumbered, but
> the current version of GNAT, in the GPL (no-service contract) form has
> gone the other direction, claiming that executables must be released
> GPL.

The no-service contract version of the GPL is not the same as the
standard GPLv2. Ordinarily the GPLv2 does not apply to the output of
the program unless the license specifies that it does (a modification or
addendum). Thus the output of a program is not part of the GPL usually,
unless specified MySQL's take on the GPLv2 without an addendum is
mistaken, in my opinion. However, copyright law probably still applies
to the programs output regardless of license, but in what way I don't
think the courts have ever specified, given that the output depends
largely on the input. GCC, Bison, and Flex, all explicitly state that
the output of the program is not under any license, and is your own
property. Perhaps the author of Shed Skin could make a note in the
license file to clarify the state of the output of his program.

There should be no problem with this Shed Skin program being under the
GPL and using it with python scripts that are not under the GPL. But if
you have any concern with a copyright license at all, you should consult
your lawyer. Too many companies see GPL'd programs as a free ride, not
willing to accept that they need a copyright license to use the code
just as they would with any code from any source. It's sad to see
because free software gets an unfair bad rap because of the greed of
others. On the other hand, others take an overly paranoid view of the
GPL and pretend it is viral and somehow magically infects your code with
the GPL license, which is false--if you use GPL'd code in your non GPL'd
application then you are in a copyright violation situation and your
only options are to either GPL your code or remove the offending GPL'd
source from your code and write your own dang code, thank you very much.

Paul Rubin

unread,

Mar 31, 2007, 11:47:36 PM3/31/07

to

Michael Torrie <tor...@chem.byu.edu> writes:
> The no-service contract version of the GPL is not the same as the
> standard GPLv2.

I don't see how that can be--we're talking about a GCC-based compiler,
right?

Michael Torrie

unread,

Apr 1, 2007, 1:06:08 AM4/1/07

to pytho...@python.org

Well, that's beside the point anyway. The output of a program is beyond
the scope of the source code license for the program. However the
default is for the output to be copyrighted the author. Thus the author
of a program is free to say (give license, in other words) that the
output of a program can distributed. The real point is the Shed Skin
author can both license the program under the GPLv2 and also say that
the output from his program is not bound by any license. There's no
conflict unless the author of Shed Skin wants there to be. Worst case,
if indeed the GPLv2 says it covers the output of the program (which I
don't believe it does), copyright law still trumps everything and the
author is free at add an exemption to the license if he chooses, which
is what I've seen done with Bison. Bison is also a special case because
the output of bison contains code fragments that are part of the bison
source code itself, which is under the GPL. Thus a special exception
had to be made in this case.

Anyway, the only real point is that if there is a concern about the
copyright and licensing of the output of Shed Skin, then we merely need
to ask the author of it to clarify matters and move on with life. With
the exception of GNAT, to date no GPL'd compiler has ever placed a GPL
restriction on its output. Whether this is explicit or implicit doesn't
matter, so long as it's there.

Michael Torrie

unread,

Apr 1, 2007, 1:13:49 AM4/1/07

to pytho...@python.org

On Sat, 2007-03-31 at 20:47 -0700, Paul Rubin wrote:

I found the real reason why the GPL'd GNAT compiler's produced
executables are required to be GPL'd, and it has nothing to do with the
license of the compiler:

"What is the license of the GNAT GPL Edition?
Everything (tools, runtime, libraries) in the GNAT GPL Edition is
licensed under the General Public License (GPL). This ensures that
executables generated by the GNAT GPL Edition are Free Software and that
source code is made available with the executables, giving the freedom
to recepients to run, study, modify, adapt, and redistribute sources and
execuatbles under the terms of the GPL."[1]

Note that it says the runtime *and* the libraries are GPL. Thus the
linking clause in the GPL requires that programs that link against them
(the executable in other words) must be GPL'd. Note that GLibC, while
being GPL, has an exception clause in it, allowing linking to it by code
of any license.

Hence it's a red herring as far as the discussion and Shed Skin is
concerned, although the licensing of any Shed Skin runtime libraries
should be a concern to folks.

[1] https://libre.adacore.com/

John Nagle

unread,

Apr 1, 2007, 1:15:45 AM4/1/07

to

Mark Dufour wrote:
> Hi all,
>
> I have recently released version 0.0.20 and 0.0.21 of Shed Skin, an
> optimizing Python-to-C++ compiler. Shed Skin allows for translation of
> pure (unmodified), implicitly statically typed Python programs into
> optimized C++, and hence, highly optimized machine language. Besides
> many bug fixes and optimizations, these releases add the following
> changes:
>

> I'm also
> hoping someone else would like to deal with integration with CPython
> (so Shed Skin can generate extension modules, and it becomes easier to
> use 'arbitrary' external CPython modules such as 're' and 'pygame'.)

Reusing precompiled external modules will be tough. Even
CPython has trouble with that. But that's just a conversion
problem. Maybe SWIG (yuck, but it exists) could be persuaded
to cooperate.

For regular expressions, here's an implementation, in C++,
of Python-like regular expressions.

http://linuxgazette.net/issue27/mueller.html

That might be a way to get a regular expression capability into
Shed Skin quickly.

> Finally, there may be some interesting Master's thesis subjects in
> improving Shed Skin, such as transforming heap allocation into stack-
> and static preallocation, where possible, to bring performance even
> closer to manual C++. Please let me know if you are interested in
> helping out, and/or join the Shed Skin mailing list.

Find out where the time is going before spending it on that.

A good test: BeautifulSoup. Many people use it for parsing
web pages, and it's seriously compute-bound.

John Nagle

Kay Schluehr

unread,

Apr 1, 2007, 3:41:50 AM4/1/07

to

On Mar 31, 11:26 pm, "Luis M. González" <luis...@gmail.com> wrote:
> On Mar 31, 8:38 am, Bjoern Schliessmann <usenet-
>
>
>
> mail-0306.20.chr0n...@spamgourmet.com> wrote:
> > Mark Dufour wrote:
> > > Shed Skin allows for translation of pure (unmodified), implicitly
> > > statically typed Python programs into optimized C++, and hence,
>
> > ^^^^^> highly optimized machine language.
>
> > ^^^^^^^^^^^^^^^^
>
> > Wow, I bet all C++ compiler manufacturers would want you to work for
> > them.
>
> > Regards,
>
> > Björn
>
> > --
> > BOFH excuse #23:
>
> > improperly oriented keyboard
>
> Mark has been doing an heroic job so far.
> Shedskin is an impressive piece of software and, if pypy hadn't been
> started some time ago, it should have gotten more attention from the
> community.
> I think he should be taken very seriously.

Indeed. The only serious problem from an acceptance point of view is
that Mark tried to solve the more difficult problem first and hung on
it. Instead of integrating a translator/compiler early with CPython,
doing some factorization of Python module code into compilable and
interpretable functions ( which can be quite rudimentary at first )
together with some automatically generated glue code and *always have
a running system* with monotone benefit for all Python code he seemed
to stem an impossible task, namely translating the whole Python to C++
and created therefore a "lesser Python". I do think this is now a well
identified anti-pattern but nothing that can't be repaired in this
case - from what I understand. However, speaking on my part, I don't
make my hands dirty with C++ code unless I get paid *well* for it.
This is like duplicating my job in my sparetime. No go. Otherwise it
wouldn't be a big deal to do what is necessary here and even extend
the system with perspective on Py3K annotations or other means to ship
typed Python code into the compiler.

mark....@gmail.com

unread,

Apr 1, 2007, 11:05:41 AM4/1/07

to

> Anyway, the only real point is that if there is a concern about the

> copyright and licensing of the output of ShedSkin, then we merely need

> to ask the author of it to clarify matters and move on with life. With
> the exception of GNAT, to date no GPL'd compiler has ever placed a GPL
> restriction on its output. Whether this is explicit or implicit doesn't
> matter, so long as it's there.

it's fine if people want to create non-GPL software with Shed Skin. it
is at least my intention to only have the compiler proper be GPL
(LICENSE states that the run-time libraries are BSD..)

mark dufour (Shed Skin author).

John Nagle

unread,

Apr 1, 2007, 12:07:55 PM4/1/07

to

Kay Schluehr wrote:
> Indeed. The only serious problem from an acceptance point of view is
> that Mark tried to solve the more difficult problem first and hung on
> it. Instead of integrating a translator/compiler early with CPython,
> doing some factorization of Python module code into compilable and
> interpretable functions ( which can be quite rudimentary at first )
> together with some automatically generated glue code and *always have
> a running system* with monotone benefit for all Python code he seemed
> to stem an impossible task, namely translating the whole Python to C++
> and created therefore a "lesser Python".

Trying to incrementally convert an old interpreter into a compiler
is probably not going to work.

> Otherwise it
> wouldn't be a big deal to do what is necessary here and even extend
> the system with perspective on Py3K annotations or other means to ship
> typed Python code into the compiler.

Shed Skin may be demonstrating that "annotations" are unnecessary
cruft and need not be added to Python. Automatic type inference
may be sufficient to get good performance.

The Py3K annotation model is to some extent a repeat of the old
Visual Basic model. Visual Basic started as an interpreter with one
default type, which is now called Variant, and later added the usual types,
Integer, String, Boolean, etc., which were then manually declared.
That's where Py3K is going. Shed Skin may be able to do that job
automatically, which is a step forward and more compatible with
existing code. Doing more at compile time means doing less work
at run time, where it matters. This looks promising.

John Nagle

mark....@gmail.com

unread,

Apr 1, 2007, 2:30:52 PM4/1/07

to

> I don't see how that can be--we're talking about a GCC-based compiler,
> right?

no, Shed Skin is a completely separate entity, that outputs C++ code.
it's true I only use GCC to test the output, and I use some GCC-
specific extensions (__gnu_cxx::hash_map/hash_set), but people have
managed to compile things with Visual Studio or whatever it is
called.

btw, the windows version of Shed Skin comes with GCC so it's easy to
compile things further (two commands, 'ss program' and 'make run'
suffice to compile and run some program 'program.py')

Paul Rubin

unread,

Apr 1, 2007, 3:21:56 PM4/1/07

to

mark....@gmail.com writes:
> > I don't see how that can be--we're talking about a GCC-based compiler,
> > right?
>
> no, Shed Skin is a completely separate entity,

I was referring to GNAT.

Kay Schluehr

unread,

Apr 1, 2007, 3:25:29 PM4/1/07

to

On Apr 1, 6:07 pm, John Nagle <n...@animats.com> wrote:
> Kay Schluehr wrote:
> > Indeed. The only serious problem from an acceptance point of view is
> > that Mark tried to solve the more difficult problem first and hung on
> > it. Instead of integrating a translator/compiler early with CPython,
> > doing some factorization of Python module code into compilable and
> > interpretable functions ( which can be quite rudimentary at first )
> > together with some automatically generated glue code and *always have
> > a running system* with monotone benefit for all Python code he seemed
> > to stem an impossible task, namely translating the whole Python to C++
> > and created therefore a "lesser Python".
>
> Trying to incrementally convert an old interpreter into a compiler
> is probably not going to work.

I'm talking about something that is not very different from what Psyco
does but Psyco works at runtime and makes continous measurements for
deciding whether it can compile some bytecodes just-in-time or let the
interpreter perform their execution. You can also try a different
approach and decide statically whether you can compile some function
or interpret it. Then you factorize each module m into m = m_native *
m_interp. This factorization shall depend only on the capabilities of
the translator / native compiler and the metadata available for your
functions. Since you care for the correct interfaces and glue code
early and maintain it continually you never run into severe
integration problems.

----------------------------------------------------------

A factorization always follows a certain pattern that preserves the
general form and creates a specialization:

def func(x,y):
# algorithm

====>

from native import func_int_int

def func(x,y):
if isinstance(x, int) and isinstance(y, int):
return func_int_int(x,y) # wrapper of natively compiled
specialized function
else:
# perform original unmodified algorithm on bytecode interpreter

Or in decorator notation:

from native import func_int_int

@apply_special( ((int, int), func_int_int) )
def func(x,y):
# algorithm

where apply_special transforms the first version of func into the
second version.

Now we have the correct form and the real and hard work can begin i.e.
the part Mark was interested and engaged in.

>
> > Otherwise it
> > wouldn't be a big deal to do what is necessary here and even extend
> > the system with perspective on Py3K annotations or other means to ship
> > typed Python code into the compiler.
>
> Shed Skin may be demonstrating that "annotations" are unnecessary
> cruft and need not be added to Python. Automatic type inference
> may be sufficient to get good performance.

You still dream of this, isn't it? Type inference in dynamic languages
doesn't scale. It didn't scale in twenty years of research on
SmallTalk and it doesn't in Python. However there is no no-go theorem
that prevents ambitious newbies to type theory wasting their time and
efforts.

> The Py3K annotation model is to some extent a repeat of the old
> Visual Basic model. Visual Basic started as an interpreter with one
> default type, which is now called Variant, and later added the usual types,
> Integer, String, Boolean, etc., which were then manually declared.
> That's where Py3K is going.

Read the related PEP, John. You will see that Guidos genius is that of
a good project manager in that case who knows that the community works
for him. The trade is that he supplies the syntax/interface and the
hackers around him fantasize about semantics and start
implementations. Not only annotations are optional but also their
meaning. This has nothing to do with VB and it has not even much to do
with what existed before in language design.

Giving an example of annotation semantics:

def func(x:int, y:int):
# algorithm

can be translated according to the same pattern as above. The meaning
of the annotation according to the envisioned annotation handler is as
follows: try to specialize func on the types of the arguments and
perform local type inference. When successfull, compile func with
these arguments and map the apply_special decorator. When translation
is unfeasible, send a warning. If type violation is detected under
this specialization send a warning or an exception in strict-checking-
mode.

I fail to see how this violates duck-typing and brings VisualBasic to
the Python community. But maybe I just underrate VB :)

Kay

John Nagle

unread,

Apr 1, 2007, 4:34:49 PM4/1/07

to

Kay Schluehr wrote:
> On Apr 1, 6:07 pm, John Nagle <n...@animats.com> wrote:
>
>>Kay Schluehr wrote:
>>
>>>Indeed. The only serious problem from an acceptance point of view is
>>>that Mark tried to solve the more difficult problem first and hung on
>>>it. Instead of integrating a translator/compiler early with CPython,
>>>doing some factorization of Python module code into compilable and
>>>interpretable functions ( which can be quite rudimentary at first )
>>>together with some automatically generated glue code and *always have
>>>a running system* with monotone benefit for all Python code he seemed
>>>to stem an impossible task, namely translating the whole Python to C++
>>>and created therefore a "lesser Python".
>>
>> Trying to incrementally convert an old interpreter into a compiler
>>is probably not going to work.
>
>
> I'm talking about something that is not very different from what Psyco
> does but Psyco works at runtime and makes continous measurements for
> deciding whether it can compile some bytecodes just-in-time or let the
> interpreter perform their execution.

That can work. That's how the Tamirin JIT compiler does Javascript
inside Mozilla. The second time something is executed interpretively,
it's compiled. That's a tiny JIT engine, too; it's inside the Flash
player. Runs both JavaScript and ActionScript generated programs.
Might be able to run Python, with some work.

> A factorization always follows a certain pattern that preserves the
> general form and creates a specialization:
>
> def func(x,y):
> # algorithm
>
> ====>
>
> from native import func_int_int
>
> def func(x,y):
> if isinstance(x, int) and isinstance(y, int):
> return func_int_int(x,y) # wrapper of natively compiled
> specialized function
> else:
> # perform original unmodified algorithm on bytecode interpreter

You can probably offload that decision onto the linker by creating
specializations with different type signatures and letting the C++
name resolution process throw out the ones that aren't needed at
link time.

>>>Otherwise it
>>>wouldn't be a big deal to do what is necessary here and even extend
>>>the system with perspective on Py3K annotations or other means to ship
>>>typed Python code into the compiler.
>>
>> Shed Skin may be demonstrating that "annotations" are unnecessary
>>cruft and need not be added to Python. Automatic type inference
>>may be sufficient to get good performance.
>
>
> You still dream of this, isn't it? Type inference in dynamic languages
> doesn't scale. It didn't scale in twenty years of research on
> SmallTalk and it doesn't in Python.

I'll have to ask some of the Smalltalk people from the PARC era
about that one.

> However there is no no-go theorem
> that prevents ambitious newbies to type theory wasting their time and
> efforts.

Type inference analysis of Python indicates that types really don't
change all that much. See

http://www.python.org/workshops/2000-01/proceedings/papers/aycock/aycock.html

Only a small percentage of Python variables ever experience a type change.
So type inference can work well on real Python code.

The PyPy developers don't see type annotations as a win. See Karl Botlz'
comments in

http://www.velocityreviews.com/forums/t494368-p3-pypy-10-jit-compilers-for-free-and-more.html

where he writes:

"Also, I fail to see how type annotations can have a huge speed-advantage
versus what our JIT and Psyco are doing."

>>The Py3K annotation model is to some extent a repeat of the old
>>Visual Basic model. Visual Basic started as an interpreter with one
>>default type, which is now called Variant, and later added the usual types,
>>Integer, String, Boolean, etc., which were then manually declared.
>>That's where Py3K is going.
>

> This has nothing to do with VB and it has not even much to do
> with what existed before in language design.

Type annotations, advisory or otherwise, aren't novel. They
were tried in some LISP variants. Take a look at this
experimental work on Self, too.

http://www.cs.ucla.edu/~palsberg/paper/spe95.pdf

Visual Basic started out more or less declaration-free, and
gradually backed into having declarations. VB kept a "Variant"
type, which can hold anything and was the implicit type.
Stripped of the Python jargon, that's what's proposed for Py3K.
Just because it has a new name doesn't mean it's new.

It's common for languages to start out untyped and "simple",
then slowly become more typed as the limits of the untyped
model are reached.

Another thing that can go wrong with a language: if you get too hung
up on providing ultimate flexibility in the type and object system,
too much of the language design and machinery is devoted to features
that are very seldom used. C++ took that wrong turn a few years ago,
when the language designers became carried away with their template
mechanism, to the exclusion of fixing the real problems that drive their
user base to Java or C#.

Python, the language, is in good shape. It's the limitations
of the CPython implementation that are holding it back. It looks
like at least two projects are on track to go beyond the
limitations of that implementation. This is good.

John Nagle

mark....@gmail.com

unread,

Apr 2, 2007, 2:52:01 AM4/2/07

to

> You still dream of this, isn't it? Type inference in dynamic languages
> doesn't scale. It didn't scale in twenty years of research on
> SmallTalk and it doesn't in Python. However there is no no-go theorem

type inference sure is difficult business, and I won't deny there are
scalability issues, but the situation has improved a lot since back in
the smalltalk days. since then, type inference theory has not stood
still: agesen' cartesian product algorithm and plevyak's iterative
flow analysis (both published around '96) have greatly improved the
situation; a 1000-fold or more increase in computer speeds have
additionally made actual type inference (experimentation) much more
practical. (for anyone interested in the techniques powering shed
skin, see agesen and plevyak's phd theses for a relatively recent
update on the field.)

but in any case, I believe there are several reasons why type
inference scalability is actually not _that_ important (as long as it
works and doesn't take infinite time):

-I don't think we want to do type inference on large Python programs.
this is indeed asking for problems, and it is not such a bad approach
to only compile critical parts of programs (why would we want to
compile PyQt code, for example.) I do think type inference scales well
enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
not there yet with Shed Skin, but I don't think it's that far away (of
course I'll need to prove this now :-))

-type inference can be assisted by profiling (so dramatically less
iterations are necessary to come to a full proof). profiling doesn't
have to fully cover code, because type inference fills in the gaps;
type inference can also be assisted by storing and reusing analysis
results, so profiling only has to be done once, or the analysis can be
made easier by running it multiple times during development. because
Shed Skin doesn't use profiling or memorization, and I know many
things to improve the type analysis scalability, I'm confident it can
scale much further than the programs it works for now (see ss-
progs.tgz from the homepage for a collection of 27 programs, such as
ray tracers, chess engines, sat solvers, sudoku solvers, pystone and
richards..).

besides, (as john points out I think), there is a difference between
analyzing an actual dynamic language and a essentially static language
(such as the Python subset that Shed Skin accepts). it allows one to
make certain assumptions that make type inference easier.

> that prevents ambitious newbies to type theory wasting their time and
> efforts.

yes, it's probably a waste of time to try and analyze large, actually
dynamic, Python programs, but I don't think we should want this at
all. computer speeds make Python fast enough for many purposes, and
global type inference scalability would demand us to throw out many
nice Python features. a JIT compiler seems better here..

where I think Shed Skin and similar tools can shine is in compiling
pure Python extensions modules and relatively small programs. having
worked on type inference for some time now, with modern techniques :),
I see no reason why we can't compile statically typed Python programs,
up to several thousands of lines. my analysis works pretty well
already (see ss-progs.tgz), and there are many things I can still
improve, besides adding profiling and memorization..

> Read the related PEP, John. You will see that Guidos genius is that of
> a good project manager in that case who knows that the community works
> for him. The trade is that he supplies the syntax/interface and the
> hackers around him fantasize about semantics and start
> implementations. Not only annotations are optional but also their
> meaning. This has nothing to do with VB and it has not even much to do
> with what existed before in language design.

I think it's more Pythonic to just profile a program to learn about
actual types..

John Nagle

unread,

Apr 2, 2007, 3:17:35 AM4/2/07

to

mark....@gmail.com wrote:
> but in any case, I believe there are several reasons why type
> inference scalability is actually not _that_ important (as long as it
> works and doesn't take infinite time):
>
> -I don't think we want to do type inference on large Python programs.
> this is indeed asking for problems, and it is not such a bad approach
> to only compile critical parts of programs (why would we want to
> compile PyQt code, for example.) I do think type inference scales well
> enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
> not there yet with Shed Skin, but I don't think it's that far away (of
> course I'll need to prove this now :-))
>
> -type inference can be assisted by profiling

Something else worth trying: type inference for separately
compiled modules using the test cases for the modules. One
big problem with compile-time type inference is what to do
about separate compilation, where you have to make decisions
without seeing the whole program. An answer to this is to
optimize for the module's test cases. If the test cases
always use an integer value for a parameter, generate hard
code for the case where that variable is a integer. As long
as there's some way to back down, at link time, to a more general
but slower version, programs will still run. If the test
cases reflect normal use cases for the module, this should
lead to generation of reasonable library module code cases.

> besides, (as john points out I think), there is a difference between
> analyzing an actual dynamic language and a essentially static language
> (such as the Python subset that Shed Skin accepts). it allows one to
> make certain assumptions that make type inference easier.

Yes. And, more than that, most programs have relatively
simple type behavior for most variables. The exotic stuff
just doesn't happen that often.

John Nagle

Paul Boddie

unread,

Apr 2, 2007, 6:31:01 AM4/2/07

to

On 2 Apr, 09:17, John Nagle <n...@animats.com> wrote:
>
> Something else worth trying: type inference for separately
> compiled modules using the test cases for the modules.

I mentioned such possibilities once upon a time:

http://blog.amber.org/2004/12/23/static-typing-and-python/

Note the subject of the original article, by the way. And as a
postscript, I'd advise anyone wondering what happened to Starkiller to
take a look at Shed Skin instead, since it more or less does what
Starkiller was supposed to do.

> One big problem with compile-time type inference is what to do
> about separate compilation, where you have to make decisions
> without seeing the whole program. An answer to this is to
> optimize for the module's test cases. If the test cases
> always use an integer value for a parameter, generate hard
> code for the case where that variable is a integer. As long
> as there's some way to back down, at link time, to a more general
> but slower version, programs will still run. If the test
> cases reflect normal use cases for the module, this should
> lead to generation of reasonable library module code cases.

People are always going to argue that a just-in-time compiler saves
everyone from thinking too hard about these issues, but I still think
that there's a lot of mileage in deducing types at compile time for a
number of reasons. Firstly, there are some applications where it might
be desirable to avoid the overhead of a just-in-time compilation
framework - I can imagine that this is highly desirable when
developing software for embedded systems. Then, there are applications
where one wants to know more about the behaviour of the code, perhaps
for documentation purposes or to help with system refactoring, perhaps
to minimise the risk of foreseeable errors. Certainly, if this latter
area were not of interest, there wouldn't be tools like pylint.

Paul

P.S. Another aspect of the referenced article that is worth noting is
the author's frustration with the state of the standard library:
something which almost always gets mentioned in people's pet Python
hates, but something mostly ignored in the wider enthusiasm for
tidying up the language.

Kay Schluehr

unread,

Apr 2, 2007, 6:31:04 AM4/2/07

to

On Apr 2, 9:17 am, John Nagle <n...@animats.com> wrote:

> mark.duf...@gmail.com wrote:
> > but in any case, I believe there are several reasons why type
> > inference scalability is actually not _that_ important (as long as it
> > works and doesn't take infinite time):
>
> > -I don't think we want to do type inference on large Python programs.
> > this is indeed asking for problems, and it is not such a bad approach
> > to only compile critical parts of programs (why would we want to
> > compile PyQt code, for example.) I do think type inference scales well
> > enough to analyze arbitrary programs of up to, say, 5,000 lines. I'm
> > not there yet with Shed Skin, but I don't think it's that far away (of
> > course I'll need to prove this now :-))
>
> > -type inference can be assisted by profiling
>

> Something else worth trying: type inference for separately
> compiled modules using the test cases for the modules.

Seem like we agree on this point. The idea of defining a type
recording phase and identifying it with UT execution was actually my
original motivation to consider alternative frameworks for compiling
Python.

> One
> big problem with compile-time type inference is what to do
> about separate compilation, where you have to make decisions
> without seeing the whole program. An answer to this is to
> optimize for the module's test cases. If the test cases
> always use an integer value for a parameter, generate hard
> code for the case where that variable is a integer. As long
> as there's some way to back down, at link time, to a more general
> but slower version, programs will still run. If the test
> cases reflect normal use cases for the module, this should
> lead to generation of reasonable library module code cases.

The nice thing about this idea is that your type system is complient
with your test base. When a test succeeds it can not be invalidated by
a recorded type that gets annotated. You really get some metadata and
program description elements for free. There are also some caveats
about subtyping and using type information that is to special - or not
special enough. In the first case the maximal nominal type that
includes the recorded type as a subtype can be chosen that is
complient to the structural properties of the type i.e. the interface
description of the class accordingly. I have no idea about deeper
specialization. Guess this can't be handled automatically.

Methodologically this means that testing and important aspects of
optimizing Python code are not separated from each other. This is
elegant and will increase overall code quality and acceptance of the
language.

Ciao,
Kay

bearoph...@lycos.com

unread,

Apr 2, 2007, 7:05:23 AM4/2/07

to

Paul Boddie:

> the author's frustration with the state of the standard library:
> something which almost always gets mentioned in people's pet Python
> hates, but something mostly ignored in the wider enthusiasm for
> tidying up the language.

There is some possibility that Python 3.1 will have what you ask for:
http://www.python.org/dev/peps/pep-3108/

Bye,
bearophile

Paul Boddie

unread,

Apr 2, 2007, 7:27:33 AM4/2/07

to

On 2 Apr, 13:05, bearophileH...@lycos.com wrote:
>
> There is some possibility that Python 3.1 will have what you ask for:http://www.python.org/dev/peps/pep-3108/

Prior to that PEP being written/published, I made this proposal:

http://wiki.python.org/moin/CodingProjectIdeas/StandardLibrary/RestructuredStandardLibrary

After being brought to the attention of the PEP's author, it seems to
have been swept under the carpet on the Wiki, but it's a more radical
proposal in a number of ways than the PEP seems to be.

Paul

bearoph...@lycos.com

unread,

Apr 2, 2007, 10:27:48 AM4/2/07

to

Paul Boddie:

> Prior to that PEP being written/published, I made this proposal:

> http://wiki.python.org/moin/CodingProjectIdeas/StandardLibrary/Restru...

On first sight it looks good. Python 3.0-3.1 is the best and probably
only possibility for such improvement (I have said 3.1 too because I
think Guido will allow some corrections in the version successive to
3.0).

Bye and thank you,
bearophile

Kay Schluehr

unread,

Apr 2, 2007, 2:17:03 PM4/2/07

to

On Apr 2, 1:27 pm, "Paul Boddie" <p...@boddie.org.uk> wrote:
> On 2 Apr, 13:05, bearophileH...@lycos.com wrote:
>
>
>
> > There is some possibility that Python 3.1 will have what you ask for:http://www.python.org/dev/peps/pep-3108/
>
> Prior to that PEP being written/published, I made this proposal:
>

> http://wiki.python.org/moin/CodingProjectIdeas/StandardLibrary/Restru...

>
> After being brought to the attention of the PEP's author, it seems to
> have been swept under the carpet on the Wiki, but it's a more radical
> proposal in a number of ways than the PEP seems to be.
>
> Paul

Note that the conflict of putting modules on top level or better
within separate packages is not an either-or decision from a
programmers point of view who just wants to access those modules. A
top level module like lib or std can be pretty virtual since you can
create modules at runtime whenever you try to import them. I used this
strategy for a project where editing objects in separate files leaded
to a better overview compared to one large file containing all
definitions. However I created one module at runtime that served as a
common access point for all these particular definitions that were
tedious to import separately and would have required file system
lookups quite often. This might even allow multiple classifications
but I haven't experimented with them yet.

Kay

Paul Boddie

unread,

Apr 3, 2007, 6:13:55 AM4/3/07

to

On 2 Apr, 20:17, "Kay Schluehr" <kay.schlu...@gmx.net> wrote:
>
> Note that the conflict of putting modules on top level or better
> within separate packages is not an either-or decision from a
> programmers point of view who just wants to access those modules. A
> top level module like lib or std can be pretty virtual since you can
> create modules at runtime whenever you try to import them.

Or, if the subpackages/submodules are small enough, just import them
and make their contents available at higher levels in the hierarchy.

> I used this strategy for a project where editing objects in separate files leaded
> to a better overview compared to one large file containing all
> definitions. However I created one module at runtime that served as a
> common access point for all these particular definitions that were
> tedious to import separately and would have required file system
> lookups quite often. This might even allow multiple classifications
> but I haven't experimented with them yet.

Yes, I tend to make aliases available quite a bit. It seems to me that
introducing a hierarchy into the standard library has a fairly limited
cost: you need to occupy some more top-level names, some unfortunately
being potentially common, but any growth in the library isn't as
likely to introduce more naming conflicts - something that anyone with
their own calendar.py, new.py or xml.py programs might find
desirable. ;-)

One problem I've become more aware of, however, is the role of C-based
modules in the library. I think there was an unresolved issue about
such modules living at non-root locations in the library hierarchy,
but aside from that, the work required to clean up any abstractions
eventually requires people to roll up their sleeves and look at non-
Python code.

Paul