static single type assignment

88 views
Skip to first unread message

Will 'Varfar'

unread,
Sep 30, 2011, 9:32:22 AM9/30/11
to shedskin-discuss
I made the term SSTA up :)

There's an issue 144 (http://code.google.com/p/shedskin/issues/detail?
id=144) (its merged into 98 but I haven't done the due diligence on
how exactly they overlap)

Basically, if you re-purpose the same identifier for another type
shedskin will say its ambiguous - "expression has dynamic (sub)type".

I tend to re-use names a lot in my python code and its a hard habit
for me to break.

I was wondering how best to go about adding support to shedskin for
it?

The example given in 144:

a=[1,2,3]
b=['ab','cd']
for i in a:
print i
for i in b:
print i

currently becomes:

str *__name__, *i;
;;;
FOR_IN(i,a,0,2,3)
;;;
FOR_IN(i,b,4,6,7)
;;;

The approach I had in mind is SSTA, by which I mean:

Track aliases for variable names in a block. Whenever there is an
assignment to a variable at the same block level as the first
initialisation of the variable, alias it to another name for output.
So the basic example:

str *__name__, *i__1, *i__2;
;;;
FOR_IN(i__1,a,0,2,3)
;;;
FOR_IN(i__2,b,4,6,7)
;;;

And, more in line with my usage pattern:

a = 1
...
a = ""

would get two versions of "a" coexisting because they are declarations
at the same level with the same AST parent node.

The C++ compiler is doing its own peepholing and ssa and such so its
going to reuse the memory and such anyway; I wouldn't expect this to
bloat the compiled executable footprint.

I have started trying to trace the paths through graph.py, but am most
interested in feedback, suggestions, warnings about edge cases and
such?

John Yeung

unread,
Sep 30, 2011, 2:43:21 PM9/30/11
to shedskin...@googlegroups.com
Well, it would be one thing if for loops created their own scope, and
the "for variable" went out of existence once the loop was exited.
(List comprehensions in Python 3 are like this.)

But since the for variable is really just a regular variable in the
same scope as whatever contains the for loop, Python programs are free
to make use of that fact. What do you propose for the following:

a=[1,2,3]
b=['ab','cd']
for i in a:
print i

print i # global integer


for i in b:
print i

print i # global string
i = 'ef'

I think it will be tough. I would guess there are plenty of cases I
haven't considered. But, if you can somehow manage to ensure that the
for variables are really only being used inside the loop, and accept
only those cases, then I think most ShedSkin users would welcome that
capability.

John

Fahrzin Hemmati

unread,
Sep 30, 2011, 4:21:09 PM9/30/11
to shedskin...@googlegroups.com
I believe you changed the topic of his post. He was talking about making
subsequent assignments of other types to the same variable name into
separate variables in the generated C++.

You could, instead of tracking integers like that, just make the
variables type-based:

int int_i;
char * char_ptr_i;

That way, any collisions are automatically ignored.

However, this should be a separate flag since it can cause unwanted
behavior. In many cases, I set, as a default, a variable to None. Then
later, I set it to whatever I get out of a dict or list, which sometimes
means an integer. If I'm not writing it for shedskin use at the time, I
won't hit a problem since None is 'falsey'. But when I later push it to
shedskin, None isn't an integer type but is null or '\0', causing type
collisions with an integer later. This is an actual problem that I
should fix by using 0 or -1, using two separate variables will just
cause huge errors when I use the char* version of the variable for
testing but assign to the int version since that comes later.

Will asked for warnings, that's one that I've run into basically every
time I moved from Python to shedskin as an afterthought to the code.
Though I do see the benefit, after all, I'm the creator of issue id 98 :)

--Fahrzin Hemmati

Will 'Varfar'

unread,
Sep 30, 2011, 5:18:11 PM9/30/11
to shedskin-discuss
To explain my (possibly flawed) plan:

First, strictly local variables and secondly strictly when the
reassignment is at the same level (indent and parent) so here's some
things:

m = ''
def func():
z = 0
for i in xrange(10):
z = '' <--- can't re-assign type since its not at the same
level
x = 3
x = '' <--- can't re-assign, its not the same level either
z = '' <--- works
m += '' <--- someone remind me what the shedskin rules ought to be
for this; is this the global or is this reading the local before its
assigned?
m = 0 <--- works because its aliasing the global
for i in ('a','b','c'): <--- re-assigning type of i - same level,
same parent
x = '' <--- can't re-assign because its not the same parent
m = '' <--- same level as the local variable called m
m = 0 <--- fails because its the same level and same parent but its
not local, it belongs to the module so its global

John Yeung

unread,
Sep 30, 2011, 5:29:10 PM9/30/11
to shedskin...@googlegroups.com
On Fri, Sep 30, 2011 at 4:21 PM, Fahrzin Hemmati <fah...@gmail.com> wrote:
> I believe you changed the topic of his post. He was talking
> about making subsequent assignments of other types to
> the same variable name into separate variables in the
> generated C++.

Perhaps. Issue 144 was specifically about loop variables, though. Of
course I trust Mark to merge issues he thinks are related, as he is
the most familiar with how things are implemented in ShedSkin. But in
my opinion, the situation described in issue 98 is more general and
more tricky to handle well than issue 144 (and its duplicate, 151).

So many people use a throwaway variable in for loops that it seems
like it would be easier to recognize this particular use case and make
a mechanism just for it.

John

John Yeung

unread,
Sep 30, 2011, 5:43:35 PM9/30/11
to shedskin...@googlegroups.com
On Fri, Sep 30, 2011 at 5:18 PM, Will 'Varfar' <willv...@gmail.com> wrote:
> To explain my (possibly flawed) plan:
>
> First, strictly local variables and secondly strictly when the
> reassignment is at the same level (indent and parent) so here's
> some things:
>
> m = ''
> def func():
>   z = 0
>   for i in xrange(10):
>       z = ''   <--- can't re-assign type since its not at the same
> level
>       x = 3
>   x = ''   <--- can't re-assign, its not the same level either
>   z = ''   <--- works
>   m += ''  <--- someone remind me what the shedskin rules ought to be
> for this; is this the global or is this reading the local before its
> assigned?

Well, the point of ShedSkin is to do what Python does, as much as
possible, so in this case, m is local to func, and this is therefore
an UnboundLocalError.

>   m = 0    <--- works because its aliasing the global

Maybe we understand "alias" differently. In Python, this is a local
variable that *masks* the global. It is completely different than the
global and can be freely assigned within func.

In general, I didn't follow your examples at all. But I haven't
really used ShedSkin other than some hello-world-type stuff, so my
understanding of ShedSkin could just be insufficient. I imagine
Fahrzin understood what you were saying.

John

willv...@gmail.com

unread,
Oct 1, 2011, 4:29:40 AM10/1/11
to shedskin...@googlegroups.com

Yes by 'alias' I meant 'mask' and you clarified the python scope rules for me.

At the moment shesdkin is a bit too restricted for my taste regards reusing a local variable name as a different type.  By relegating a local variable to the tempvars when it is reassigned at the same ident/parent as the previous assignment i hope to broaden the python that shedakin accepts.

On Sep 30, 2011 11:43 PM, "John Yeung" <gallium....@gmail.com> wrote:

On Fri, Sep 30, 2011 at 5:18 PM, Will 'Varfar' <willv...@gmail.com> wrote:

> To explain my (possib...

Well, the point of ShedSkin is to do what Python does, as much as
possible, so in this case, m is local to func, and this is therefore
an UnboundLocalError.


>   m = 0    <--- works because its aliasing the global

Maybe we understand "alias" differently.  In Python, this is a local
variable that *masks* the global.  It is completely different than the
global and can be freely assigned within func.

In general, I didn't follow your examples at all.  But I haven't
really used ShedSkin other than some hello-world-type stuff, so my
understanding of ShedSkin could just be insufficient.  I imagine
Fahrzin understood what you were saying.

John


--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" gr...

willv...@gmail.com

unread,
Oct 3, 2011, 3:55:46 PM10/3/11
to shedskin...@googlegroups.com
I didn't really find my way around infer.py so I made a little
prototype myself just to get my idea straightened out:

def test(w_a,w_b=None,*args,**kwargs):
w_a=[1,2,3]
w_b=['ab','cd']
for w_i in w_a:
print w_i
for w_i in w_b:
print w_i
for w_i,w_j in w_b:
print w_i,w_j
w_j = 2

becomes (not sorted by line):

Function test
var w_i: #9: for w_i,w_j in w_b: (assign_from w_b[0])
<- w_i: #7: for w_i in w_b: (assign_from w_b)
<- w_i: #5: for w_i in w_a: (assign_from w_a)
var w_j: #11: w_j = 2 (is_a <type 'int'>)
<- w_j: #9: for w_i,w_j in w_b: (assign_from w_b[1])
var args: (is_a *args,is_a arg[2],is_a list)
var w_a: #3: w_a=[1,2,3] (assign_from (list_of ('is_a', <type
'int'>),is_a list))
<- w_a: (is_a arg[0])
var w_b: #4: w_b=['ab','cd'] (assign_from (list_of ('is_a', <type
'str'>),is_a list))
<- w_b: (is_a arg[1])
var kwargs: (is_a **kwargs,is_a arg[3],is_a dict)

Trying to work out how to make a inferred static typing from scratch
is a fun problem.

I currently walk the module looking for the first assignment to each
variable, or reassignment if unconditional, and noting them. I then
pass through a second time working out the references and noticing the
types, even if its just to say 'same as other variable'. I would
think its then to iterate over those variables who have 'assign_from'
and fold them in when the assigned from has no assign_from themselves,
until all are folded or until its a stalemate. How does this compare
to the shedskin approach?

Will 'Varfar'

unread,
Oct 6, 2011, 7:53:35 AM10/6/11
to shedskin-discuss
here's a better way to illustrate what I mean about scoping variables
by their (re)assignment:

https://gist.github.com/1267214

save the file locally and then, in a browser, move over the
identifiers.

Blue is the variable assignment, green is a variable reference and
orange is a variable that has been superseded by SSTA.

Every time we handle an assignment (including to the target of a for
loop) we see if the AST node's statement-level parent is a parent or
peer of the previous declaration of that name (I consider variable,
class and function declarations as simply names). If so, then SSTA
lets us supersede that declaration from that point onwards.

When then resolving variables we have to check to see which SSTAed
variable for the name to use, based on the traversal order of the
reference and the declaration. Care must be taken when resolving
those variables referred to in nested functions to ensure that the
declarations they resolve to have no SSTA superceding occurring (i.e.
if encountered, merge them so constraints apply to all - it becomes
one variable in the output) and that the variable is declared before
the first invocation of the function and such.

This html output is from the beginnings of my own python->c++
playground code.

I can see how one might synthesise nested functions and closures and
instance method variables using structs

/Will

Fahrzin Hemmati

unread,
Oct 6, 2011, 7:56:39 AM10/6/11
to shedskin...@googlegroups.com, Will 'Varfar'
The gist can be found browser-renderable here:
http://fahhem.com/shedskin/varfar.html

Will 'Varfar'

unread,
Oct 10, 2011, 12:23:49 PM10/10/11
to shedskin-discuss
At the risk of talking to myself, but this is a fun hobby coding
exercise:

The SSTA pass is just the first step and I finally see the bigger
picture where its followed with escape analysis.

By a second escape-analysis path you can ignore the depth of the
assignment it is superseding, as long as you join together SSTAs into
a single variable whenever it is referenced at an outer scope.

You also trivially know if variables have been dereferenced and such,
so you can imagine all kinds of concise optimisations by omission
(whenever the computation of any temporary in the assignment is side-
effect-free).

https://gist.github.com/1267214 has been updated to show this

My prototype has ballooned to ~300 LOC

srepmub

unread,
Oct 12, 2011, 4:24:49 PM10/12/11
to shedskin-discuss
sorry to not have chimed in sooner!

so yeah, I guess we can be smarter about reusing names.. but as can be
seen from the issues, I'm a bit hesitant to add complexity for
something that is easy to work around.. :-) in the past, I've gone on
a few similar tangents (escape analysis, variant types, generating C++
templates..), and always ended up ripping the result out, because the
added complexity became too much to handle. so there's this gut
feeling (right or wrong) that trying to support more than basic static
typing is going down a slippery slope..

but if you guys insist, I guess in this case things can be kept mostly
separate from the rest of the compiler. for example, by adding a pre-
processing stage, that changes variable names in the AST. we'd only
have to communicate renames somehow, so we can report the original
names in warning/error messages.

I'd be more interested in an escape analysis, as it can actually
improve performance in many cases. but this is probably best done at
code-generation time, since knowing the types/callgraph is essential
here. I'm not sure what you mean with your comment about this.. please
provide more examples with your comments.. ;-)

thanks,
mark.
> https://gist.github.com/1267214has been updated to show this

srepmub

unread,
Oct 12, 2011, 4:45:48 PM10/12/11
to shedskin-discuss

> I currently walk the module looking for the first assignment to each
> variable, or reassignment if unconditional, and noting them.  I then
> pass through a second time working out the references and noticing the
> types, even if its just to say 'same as other variable'.  I would
> think its then to iterate over those variables who have 'assign_from'
> and fold them in when the assigned from has no assign_from themselves,
> until all are folded or until its a stalemate.  How does this compare
> to the shedskin approach?

I guess things get a bit more complicated for larger/arbitrary
programs. there are some tricky situations that are hard to solve from
scratch.. ;-) you can find a short description of the three main
techniques (and a link to my master's thesis, which still describes
two of those) here:

http://en.wikipedia.org/wiki/Shedskin

mark.

Will 'Varfar'

unread,
Oct 17, 2011, 9:53:13 AM10/17/11
to shedskin-discuss
thanks for the encouragement :)

I cannot pretend to be charging away writing lots of code; its mainly
just a thought exercise.

Every time a local name is assigned to, it only has to be typed in the
scope that it is reachable in.

There are stark similarities for the scope of global and class names
but I am hesitant to imagine extending it generally to them until I'm
sure its workable locally.

The first trick is to register names as they are assigned in the
function either by an assignment or in a for/with/except/def/class
statement.

Then a second pass is to work out which of these assignments it is
referring to; if an assignment is conditional, it has to be 'joined'
with those assignments to that name that are still reachable.

Try/Except/(Else) blocks are rather more complicated, but I resist
just joining all variables that cross a try block; it would be nice to
imagine at least attempting to resolve which things can be thrown when
and such.

And the same is true of locals in a containing function - its easy if
they are simply calls, a bit harder if they are closures.

Out of this you could imagine an SSTA'd python source being generated
- python that clearly related to the original but all names
randomised. This would be a great way to test the correctness of any
SSTA algorithm under-fire and also useful input to shedskin - it could
be evaluated if it was a useful preprocessing step.

My mind jumps further ahead as I track the dereferencing of names it
becomes possible to know names that are never used, or to know the
index in tuples and such. A lot of python code uses tuples for
intermediate values and it would be able to spot them and turn them
into classic stack-based structs in the generated c++ code...

John Yeung

unread,
Oct 17, 2011, 6:58:55 PM10/17/11
to shedskin...@googlegroups.com
> The first trick is to register names as they are assigned in the
> function either by an assignment or in a for/with/except/def/class
> statement.

Don't forget that in Python, for and with (and list comprehensions in
Python 2) do NOT create new scope. Genuine Python programs can rely
on this; if you make it so that they have their own scope, the program
is not genuine Python anymore.

On the other hand, def and class do create new scope, and they are the
idiomatic way to create closures in Python.

John

willv...@gmail.com

unread,
Oct 18, 2011, 1:06:40 AM10/18/11
to shedskin...@googlegroups.com
> Don't forget that in Python, for and with (and list comprehensions in
> Python 2) do NOT create new scope.  Genuine Python programs can rely
> on this; if you make it so that they have their own scope, the program
> is not genuine Python anymore.

yes very much on my mind!

Interestingly def and class can be thought of as defining a type to go
with a name assignment. One of my test pieces is like this:

if .....:
def func():
b = 10
else:
def func():
b = 12
func()

This is departing slightly from the current shedskin restrictions but
I think this restriction can be lifted if, eventually after lots of
hacking and tedious testing, closures and inner functions generally
were modelled as c++ structs with the variables they refer to in the
struct and functions that are dereferenced or reassigned are modelled
as c++ functors. You could imagine that all working.

Well, in fact, as long as you don't have a compiler nor eval module,
you could imagine an entire introspection built into the c++ mapping
with true dynamic code (think more cython-like in that regard).

See, all projects grow until they exhaust reasonable attempts to start
on them :)

Mark Dufour

unread,
Oct 18, 2011, 12:16:57 PM10/18/11
to shedskin...@googlegroups.com


My mind jumps further ahead as I track the dereferencing of names it
becomes possible to know names that are never used, or to know the
index in tuples and such.  A lot of python code uses tuples for
intermediate values and it would be able to spot them and turn them
into classic stack-based structs in the generated c++ code...


yeah, heap-based tuples are a very common source of slowness in shedskin-compiled programs. turning tuples into stack-based structs would complicate code generation quite a bit however, since non-scalars are all pointer-types right now (this really keeps really simple).. but perhaps we can get much of the speedup using a different allocation scheme, where not everything is pure heap allocation (e.g. arena-based allocation as in cpython, caching schemes, a separate stack where short-lived objects live..).. so everything is still a pointer.

mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

willv...@gmail.com

unread,
Jan 2, 2012, 5:20:16 PM1/2/12
to shedskin-discuss
I have tried to elaborate on my ideas about SSTA on my blog:

http://williamedwardscoder.tumblr.com/post/15203221199/less-restricted-python

Mark Dufour

unread,
Jan 4, 2012, 4:18:57 PM1/4/12
to shedskin...@googlegroups.com
On Mon, Jan 2, 2012 at 11:20 PM, willv...@gmail.com <willv...@gmail.com> wrote:
I have tried to elaborate on my ideas about SSTA on my blog:

http://williamedwardscoder.tumblr.com/post/15203221199/less-restricted-python


I have had several of the same ideas/wishes as well, especially early on (around 5 years ago).. but in the end the added complexity usually didn't seem worth it, at least not for me, especially given that the respective restrictions are usually easy to work around (reusing variables for different types, closures and inline functions, automatic specialising, nullable primitive types..). building shed skin so far has been a humbling experience.. :-)

I'd love to have someone work on escape analysis in shedskin though. heap allocation is harder to work around manually.. ;-)

thanks,
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

willv...@gmail.com

unread,
Jan 5, 2012, 7:10:31 PM1/5/12
to shedskin...@googlegroups.com
my thought was that an SSTA is a sourcecode transform, and that it
could be done as a preprocessing step. This would be a good way of
validating its correctness - all shedskin programs should survive it
run the same output as without SSTA, and some programs that currently
fail in shedskin should compile with SSTA.

Of course my hobby coding time is not really leaning towards hacking
Python compilers, however much I enjoy thinking about it abstractly ;/

/Will

Mark Dufour

unread,
Jan 7, 2012, 6:09:50 PM1/7/12
to shedskin...@googlegroups.com
On Fri, Jan 6, 2012 at 1:10 AM, willv...@gmail.com <willv...@gmail.com> wrote:
my thought was that an SSTA is a sourcecode transform, and that it
could be done as a preprocessing step.  This would be a good way of
validating its correctness - all shedskin programs should survive it
run the same output as without SSTA, and some programs that currently
fail in shedskin should compile with SSTA.

sure, yeah, that would work.. but imho, it would still be a lot of work for little gain, since the restriction is trivial to work around..

thanks,
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

Tony Veijalainen

unread,
Jan 27, 2012, 8:34:46 AM1/27/12
to shedskin...@googlegroups.com
I would be happy to have possible of multiple types as parameters (especially numeric compatible types like int and float) as for example I tried to reimplement more simple fraction module and I could not find way to confirm to old interface, even I removed use of numeric, abc etc and using only hasattr to check duck typing style for the form of input.  It should be possible to call Fraction(fract) to make sure that fract is Fraction after that, for example, even normally it is called with integer parameters. I was kind of hoping to let the input to be float also, but that I could probably manage with only making another method or optional parameter to init.

Also what frustrates sometimes is the missing * unpacking, as I often need to transpose values of varíable number of elements. Of course I can prepare simple generator to replace zip, but still is different from CPython.


2012/1/8 Mark Dufour <mark....@gmail.com>

--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.

fractions.py
gcd_test.py

John Yeung

unread,
Jan 27, 2012, 4:23:04 PM1/27/12
to shedskin...@googlegroups.com
Tony,

A few quick notes:

Shed Skin is meant to compile a subset of valid Python 2 programs, but
it seems you are using Python 3. (Of course, a subset of valid Python
3 programs are also valid Python 2 programs, and a subset of these can
be compiled by Shed Skin.)

Shed Skin does not have built-in support for arbitrary-precision
integers like CPython has.

Your fractions.py source code has a syntax error near the bottom; it
seems like you used a right paren when you meant a single quote.

John

Reply all
Reply to author
Forward
0 new messages