Desugaring -> tools

10 views
Skip to first unread message

Joe Gibbs Politz

unread,
Apr 29, 2013, 9:13:29 PM4/29/13
to lamb...@googlegroups.com, Justin Pombrio
A specific question and a broad question

Specific question:

Say I want to perform lexical desugaring and transform the result back
into runnable Python. What problems will I run into? Is this just a
nonsensical thing to do? This form is more useful for tools, but is
it easy to get back to something that Python could run if, say, we did
a refactoring or code insertion at that level?

Broad question:

I've exchanged a few emails with a researcher who's prototyping a
gradual type system for Python (repo here:
https://github.com/mvitousek/reticulated), and stands to benefit from
(at the very least) the scope desugaring we provide, if not more. The
workflow of his tool is to get the Python AST, make changes to it
(wrapping functions to check that their arguments are correct, etc),
and then evaluate that AST using exec. I'm wondering how we could
help by providing a better starting point than a pure Python AST.
What's the best way for us to do that?

We already provide, in code, algorithms that he could replicate or
refer to in order to make sure he's covering all the cases. But what
more can we provide with the pipeline we already have?
Brainstorming/high-level suggestions welcome; this could be relevant
if we want to, say, help folks implement refactorings in Python IDEs
as well.

Matthew Milano

unread,
Apr 30, 2013, 2:18:24 AM4/30/13
to lamb...@googlegroups.com, Justin Pombrio
The majority of the phase1 transformation is easily reversible; the
only issues are that the phase removes nonlocal and global statements,
which is easy to take out. All you would likely have to do is remove
the Lexical-only constructs, turn things like f = def f() back into
just def f(), and revert all Lex*Ids to PyIds. For reference, the
main changes in place at the end of phase1 are explicit scope blocks
and PyIds replaced with local, global, and instance Ids.

Depending on what semantics he would expect for scope with classes, we
could write a transformation from sane-scope to python-scope following
similar algorithms to phase2, but this would be entirely new code.

So, as seems to often be the case with scope, everything except
classes is relatively easy to do with our current codebase.

~matthew
> --
> You received this message because you are subscribed to the Google Groups "lambda-py" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lambda-py+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Joe Gibbs Politz

unread,
Apr 30, 2013, 10:23:04 AM4/30/13
to lamb...@googlegroups.com, Justin Pombrio
Is it possible to associate every variable in the original Python
program with the block that binds it? That data structure might be
the most useful thing to provide.

Matthew Milano

unread,
Apr 30, 2013, 12:09:19 PM4/30/13
to lamb...@googlegroups.com, Justin Pombrio
That would be pretty easy to do given the phase1 transformation of the python program.

Joe Gibbs Politz

unread,
Apr 30, 2013, 1:46:07 PM4/30/13
to lamb...@googlegroups.com, Justin Pombrio
Hmm that's interesting. I can imagine this as sort of an overlay of
lexical information on top of the existing AST. What ambiguities
would be left by only doing this first phase? Could classes get
confused about instance variables and the like?

Matthew Milano

unread,
May 1, 2013, 2:33:18 AM5/1/13
to lamb...@googlegroups.com, Justin Pombrio
So classes won't get confused about instance variables; those are
actually all identified in phase1. phase2 actually *removes* all
instance variables and replaces them with get/set field. What you
will have which is less-than-ideal is the weird
locals-skip-class-bodies semantics. I'm not sure how relevant that is
to gradual typing - the one relevant edge case could be:

def f():
class c:
x = 3 #instance, returned as a local by locals() in class body
def g():
print(x) #x is global here
g()
f()
~matthew

Joe Gibbs Politz

unread,
May 7, 2013, 8:50:13 AM5/7/13
to lamb...@googlegroups.com, Justin Pombrio
I think we should consider building a Python AST imbued with lexical
information. This could take the form of a NodeVisitor and/or
NodeTransformer
(http://docs.python.org/3/library/ast.html#ast.NodeVisitor) that
associates each identifier with some metadata on each visit. This
would be a compelling demonstration for the paper's artifact
evaluation, and would certainly be immediately useful for
Python-implemented Python tools. It also would give us a foothold to
start putting more complicated metadata or desugarings into a form
that's immediately useful to folks using Python.

For example, we could implement an environment visitor that allows the
association of arbitrary data with identifiers at binding positions,
which then always gives the *correct* associated information back when
queried with particular identifiers. So if you were NodeVisiting our
favorite example:

def f(x: Int):
class c():
x = "foo"
print(x)
def m(self):
print(x)

At the top of f and c, there would be extra information about the
variable `x' being available for that block, and the programmer could
choose to associate information with each (say, annotations and
initial values). When the programmer looks up the information at the
use site of the first print(x), they would get, say `String' back, but
at the second they would get `Int'.

Concretely, I think the top-level implementation of such a tool should
happen in Python. It would parse the program itself and then call out
to \Py, which would return some JSON with this metadata, and then
create the Visitor/Transformer on top of that metadata. We need to
think carefully about which forms are binding forms and how we
represent the relationship between those places and identifiers, but
it should be quite doable.

(We could also implement phase1/phase2 in Python itself, but that
comes with all the attendant difficulties of mirrored implementations,
etc, so I think for a first cut, we should stay in Racket for the
lexical analysis itself.)

Is anyone interested in taking this on, in whole or in part (Justin,
you're welcome to as well, since this seems to be in your research
trajectory)? I'm willing to help out, of course.

Matthew Milano

unread,
May 7, 2013, 5:12:39 PM5/7/13
to lamb...@googlegroups.com, Justin Pombrio
If it's of interest to this effort, there already exists a lexical-phase pretty-printer that produces very surface-looking python code, with extra information like inserted comments and inserted curlies.  I've been using it to debug phase1 and phase2 transformations for a while now.  It's not fully complete, but was certainly mature enough to speed up my development cycle.  The function is called lexexpr-print and it's in the file python-lexical-printer.rkt

~matthew
Reply all
Reply to author
Forward
0 new messages