Some non-semantics tasks

12 views
Skip to first unread message

Joe Gibbs Politz

unread,
Apr 2, 2013, 9:23:19 AM4/2/13
to lamb...@googlegroups.com
There are a few tasks that I think would help our debugging immensely
that have nothing to do with semantics:

- Change the Store to be a pair ((hashof Address Value), Address),
where the stored address represents the next address to allocate.
This will let us serialize everything we need to store the results of
running libraries to disk, which should help our startup time when
running tests, since we can save ourselves running that boilerplate
every time. I'm taking a stab at this, but seem to have broken some
things; I may ask for a code review if I continue to be stuck.

- Carry source locations from surface Python down to the core. We
have this information at the top level, but we simply discard it when
we build our ASTs in desugaring. We should add a position to every
syntactic form to hold the position of the expression so when we see
errors in the core we can see which surface expression it came from.
This is really easy to do, just a fair amount of boilerplate that we
should add while no one is adding new features.

- Provide "stack traces" from the intepreter that show the locations
of visited expressions when an error reaches the top of the
interpreter. I think we can do this with a use of plai-typed/untyped
try-catch around interp-env and throwing a special exception that
contains a Result on non-normal returns. This requires the source
location information to be useful, and I can outline the design in my
head for anyone who wants to tackle it.

Matthew Milano

unread,
Apr 2, 2013, 11:20:02 AM4/2/13
to lamb...@googlegroups.com
This might seem a tad drastic, but has anyone considered taking a look at each of the stages and determining whether they would be easier to maintain and use if they were rewritten? If for no other reason, plai-typed/untyped is a bit of a silly language to use for a large project such as this, and plai-typed's type checker simply won't work at this scale. 

It may seem like an unduly large effort, but could save us much time and many bugs down the road.  During this re-write we would also be able to more easily make architectural changes for efficiency purposes. 

~matthew
--
You received this message because you are subscribed to the Google Groups "lambda-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lambda-py+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Joe Gibbs Politz

unread,
Apr 2, 2013, 11:26:55 AM4/2/13
to lamb...@googlegroups.com

The immediate trouble is that doing so incrementally is a bit of a pain because we have to rip out all the data definitions that are in plai-typed to make progress. One option is to see if we can crisply describe our concerns to Matthew Flatt and get the language improved.

Do we have a clear idea of what makes plai-typed annoying? Slow type checking is certainly a big one, but anything else?

I'm also happy to see someone do a little experiment in rewriting to regular structs and racket/base or Typed Racket. If there's a reasonable recipe for moving things over, we can do it on a large scale.

Other thoughts on general engineering improvements?

Matthew Milano

unread,
Apr 2, 2013, 12:33:12 PM4/2/13
to lamb...@googlegroups.com
I agree that incremental re-writes are difficult; I think I was
suggesting a less-incremental approach. If we were to re-write the
project, there are a few ways to go about this.

1 - Though it's not nearly as incremental as we may like, we always do
have the option of taking the bigger chunks - all of interp, for
example - and re-implementing them in some language which can import
plai-typed/untyped functions and can be imported by plai-typed/untyped
modules. This is likely Racket or TR.

2 - We can take the plunge and re-implement all of a stage (desugar,
for example) in any language we feel like, and define some glue
between the data structures in that language and the equivalent ones
in our plai-typed/untyped codebase. The simplest way to implement
glue would just be to serialize the structures and pass strings to and
from the re-implemented segments.

3 - The final option is that we could discard the notion of doing this
incrementally altogether, and just start an implementation from
scratch in a language of our choice.

If we go with one of these strategies, I would prefer 2 or 3. I have
personally found that attempts to redesigning a project incrementally
often result in code that remains buggy, with various ill-fated design
choices made or preserved to maintain "backwards compatibility" with
the rest of the codebase.

In any case, the first step towards any refactoring or rewriting
effort is to review our existing codebase and determine whether they
would be easier to maintain and use if they were rewritten. The scale
of the overhaul we decide we need will determine which of the many
re-writing options makes the most sense.

~matthew

Joe Gibbs Politz

unread,
Apr 2, 2013, 1:39:06 PM4/2/13
to lamb...@googlegroups.com
Would you still want to rewrite things even if they weren't
implemented in plai-typed?

Meaning --- is it the language that's broken, or do you think the
implementations of different pieces are too opaque and a rewrite is
just a good idea?

Matthew Milano

unread,
Apr 2, 2013, 2:26:24 PM4/2/13
to lamb...@googlegroups.com
Short answer: yes, even if things weren't in plai-typed I would still
be proposing this.

I'm suggesting that we take a good look at the implementations that we
have and ask ourselves if they would be easier to maintain, extend,
and understand if rewritten. I'm by no means an expert in the
codebase, especially past desugar, so the answer to this could be
"no;" but from just listening to the discussions on bugs and on this
mailing list, it looks like every time we need to re-architecture a
feature into lambda-py, we trigger a massive effort which often winds
up interacting poorly with the remaining code that wasn't touched in
the rewrite.

It could be that within the last weeks we've effectively completely
rewritten the segments of the project that needed rewriting. It could
be that the current design has good support for the remaining features
we want to incorporate into lambda-py. But I want us to take a few
days, read the codebase, and assert this. And if we find during this
process that the code leaves something to be desired, then we can
slate that code for rewriting.

I understand that this is a research effort; efficient code isn't our
goal and massive engineering efforts, even if they produce
much-improved code, are frequently a waste of time in the context of
producing an actual research result. We should only deem this
necessary if it will actually enable us to produce a more effective
artifact by artifact submission time.

~matthew

Joe Gibbs Politz

unread,
Apr 2, 2013, 3:41:54 PM4/2/13
to lamb...@googlegroups.com
It's worth considering. I'd like to hear more from others.

Opinions on good ways forward here? Things to think about:

- Will it be less effort overall to re-implement some or all of what we have?

- How hard is it to build and extend what we have now?

- What are strategies for incrementally improving what we have now?

- (Matthew) Do you have an idea of what the codebase should look like
after rewriting? Why can't we get there incrementally?

Matthew Milano

unread,
Apr 2, 2013, 6:58:48 PM4/2/13
to lamb...@googlegroups.com
For my part - I'm unfamiliar with large swaths of the project, having
concentrated all of my efforts on the earlier stages of desugaring. I
don't have some sort of vision of what the codebase will look like
when we're done. And it's quite possible we will be able to get where
we need to go working incrementally. The parts of the codebase that
would be candidates for rewriting are the parts where the design that
we use has diverged substantially from the design in use when the code
was written, resulting in potentially-contradictory program logic. I
will use python-phase2.rkt as an example of a file I might rewrite,
why I might rewrite it, and what I'd fix by rewriting.

Some aspects of python-phase2.rkt are seriously dense, there's some
repeated code, and one of the passes uses mutation to keep track of
state as it works. I will likely fold several of the passes in phase2
into one pass, eliminate any mutation in the phase, and try to
eliminate (or at least keep track of) repeated code segments. I gave
myself quite a few bugs in the last week before the deadline by
changing an algorithm in one location that was employed frequently in
other locations.

Additionally, phase2 evolved over time, as did much of our codebase;
as my understanding of what was needed in class desugaring and locals
evolved, so too did the code. This has resulted in some contradictory
design choices and messy code, with some dead code branches likely.
Working with the code has become difficult, and a new design would
make maintaining phase2 substantially easier.

~matthew

Joe Gibbs Politz

unread,
Apr 2, 2013, 10:25:03 PM4/2/13
to lamb...@googlegroups.com
Sounds reasonable.

What do people think about other parts of the project? I actually
think the interpreter is in pretty OK shape, aside from the implicit
state in the store that I'd like to get rid of, a few cleanups to our
expression and value datatypes, and some primitives that should be
expressions. I feel like it's OK for that part to proceed
incrementally, but please speak up if you have a different opinion,
because I want to hear it.

I'm least familiar with the lexical phases and desugaring; most of my
time has been spent in the interpreter and the libraries.

Junsong Li

unread,
Apr 2, 2013, 10:26:05 PM4/2/13
to lamb...@googlegroups.com
It seems that I missed a really important discussion. 

For the annoying thing in plai-typed, How do you think some code like (append ... empty empty empty), And  the type error of (format "a string ~a ~a" a b) caused by the typed-in "format" which exactly takes two arguments? I personally think they are a little bit silly. 

I only worked on a small part of the project, I am not really sure about the idea to rewrite the whole project. But the module is actually the matter of scope, If the scope is easier to maintain in the new rewritten codebase, so will the module. 


Daniel Patterson

unread,
Apr 2, 2013, 10:27:20 PM4/2/13
to lamb...@googlegroups.com
To chime in - I would be really happy about some large scale code
cleanup. For me, it would be ideal for us to figure out what the
bottlenecks in plai-typed are and to see if they can be fixed - our
code is really hard to test (because so much depends on the massive
initial set of libraries), and having type checking is a really nice
guarantee.

If those bottlenecks can't be worked out, I don't have a problem with
rewriting (probably in plain racket) for the sake of speed - I'm less
convinced about starting from scratch otherwise, simply because we
have a lot of code that does a lot of good stuff, and it is separated
out into a bunch of different phases that are more or less independent
- which make it seem like a pretty good target for refactoring. If all
our phases were a tangled mess, I'd feel really differently, and would
probably support gasoline + lighter (and then rewrite, of course).

But I say that having only poked around much of the code-base, so
maybe that's an overly optimistic view of where things stand.

Joe Gibbs Politz

unread,
Apr 2, 2013, 10:30:47 PM4/2/13
to lamb...@googlegroups.com
> For the annoying thing in plai-typed, How do you think some code like
> (append ... empty empty empty), And the type error of (format "a string ~a
> ~a" a b) caused by the typed-in "format" which exactly takes two arguments?
> I personally think they are a little bit silly.

Thanks for bringing these up! I'd forgotten about these and they
annoy me as well. I think there's a collection of these things we
could ask Matthew Flatt to fix, but if there are too many maybe we
should give up on plai-typed. For what it's worth these types would
also be tricky to write in OCaml; we may just have to be better about
writing helpers for ourselves.

Junsong Li

unread,
Apr 2, 2013, 10:43:49 PM4/2/13
to lamb...@googlegroups.com
I think we can write macros to help us if need.

@joe, Is it possible to write a new language for this project? I mean the reason why we use plai-typed is that we need the concept of define-type, type-case and nothing more. Why not use implement the two things by ourselves. I see you mentioned the use struct of racket.

Alejandro Martinez

unread,
Apr 3, 2013, 11:24:13 AM4/3/13
to lamb...@googlegroups.com
I agree WRT the interpreter and library situation is similar IMO, it needs some cleanup too but, it is pretty Ok.

I think during this review we should do a list of pending features that are desirable to add and a blueprint of the modifications, if any, needed in the infrastructure to support them. For example, a notable Python feature we have mostly ignored is call with named parameters and dictionary expansion (**kwargs) which seems not trivial to implement in our core, we don't really support default arguments either.


2013/4/2 Joe Gibbs Politz <joe.p...@gmail.com>



--
Alejandro.

Joe Gibbs Politz

unread,
Apr 7, 2013, 4:09:28 PM4/7/13
to lamb...@googlegroups.com
Matthew - it sounds like you have some compelling reasons to re-do parts of phase1/phase2.  I don't see that code needing serious alterations (does anyone else?) so it seems good for you to take that on as a rewriting/refactoring task.  We have plenty of tests to check regressions on, so we'll know if the new solution works.

I think we'll be more conservative in our changes to other parts of lambda-py, since it seems dangerous to rip out our common vocabulary/understanding of values and the interpreter/etc.

I do think the idea of going through existing code and coming up with a work plan is a good one.  My suggestions that started this thread were external to any particular *feature*, and more just about general performance improvements and refactorings.

I think that Alejandro has a good proposal on the other thread for improving functions/default arguments.  In general, I think there are some things surrounding Python's various ways of doing function application that would be good to clean up and address.

The other major thing we should go for (and I know I've said this before, but I *do* think it's extremely valuable), is pushing through our unittest to run Python's tests unchanged.

Junsong, how should we make progress on that?  You had written a message saying that there were lots of minor things that need to be added to push through unittest:

     - SyntaxError is not caught in lambda-py, OverflowError, etc. are still missing.
     - operations like pow, shift are still missing.
     - daily use methods like split, join, startwith are still missing

Is the problem mainly that we lack library functions?  Because then we should just start tackling those until we can push through real tests.  Are there any other *features* we need to get this working?

Junsong Li

unread,
Apr 7, 2013, 10:03:29 PM4/7/13
to lamb...@googlegroups.com
On 2013-4-8, at 上午4:09, Joe Gibbs Politz <joe.p...@gmail.com> wrote:

Matthew - it sounds like you have some compelling reasons to re-do parts of phase1/phase2.  I don't see that code needing serious alterations (does anyone else?) so it seems good for you to take that on as a rewriting/refactoring task.  We have plenty of tests to check regressions on, so we'll know if the new solution works.

Matthew and Joe, May I join parts of the phase work? as I realize that without more understanding of the phase1 and phase2, it is not possible for me to bring a better module system into lambda-py. I may start on something that is trivial to phase1/phase2 like cleaning code or something so that later I am able to help Matthew work on the two phases. How do you think?

Junsong, how should we make progress on that?  You had written a message saying that there were lots of minor things that need to be added to push through unittest:

     - SyntaxError is not caught in lambda-py, OverflowError, etc. are still missing.
     - operations like pow, shift are still missing.
     - daily use methods like split, join, startwith are still missing

Is the problem mainly that we lack library functions?  Because then we should just start tackling those until we can push through real tests.  Are there any other *features* we need to get this working?

I will summarize them and email to the group.

Joe Gibbs Politz

unread,
Apr 8, 2013, 3:21:14 PM4/8/13
to lamb...@googlegroups.com
Matthew and Joe, May I join parts of the phase work? as I realize that without more understanding of the phase1 and phase2, it is not possible for me to bring a better module system into lambda-py. I may start on something that is trivial to phase1/phase2 like cleaning code or something so that later I am able to help Matthew work on the two phases. How do you think?

Matthew, do you want to write down an outline of your plan and we can see if there are separable tasks to split off?

Junsong, do you have a particular idea of changes that you'd like to make/pieces you need to understand?  What about modules needs information from these phases; is it just finding global identifiers, or something more subtle?

Matthew Milano

unread,
Apr 8, 2013, 3:29:20 PM4/8/13
to lamb...@googlegroups.com
I'll be back in Providence on Wednesday and can build a more comprehensive plan when I return. 

Junsong - I'll give you a walk through of the phases after Wednesday. 
--

Junsong Li

unread,
Apr 8, 2013, 10:52:04 PM4/8/13
to lamb...@googlegroups.com

The other major thing we should go for (and I know I've said this before, but I *do* think it's extremely valuable), is pushing through our unittest to run Python's tests unchanged.

Junsong, how should we make progress on that?  You had written a message saying that there were lots of minor things that need to be added to push through unittest:

     - SyntaxError is not caught in lambda-py, OverflowError, etc. are still missing.
     - operations like pow, shift are still missing.
     - daily use methods like split, join, startwith are still missing

Is the problem mainly that we lack library functions?  Because then we should just start tackling those until we can push through real tests.  Are there any other *features* we need to get this working?

let me give you a simple summary about it.

Python's tests are located at Python-3.2.3/Lib/test. There are over 400 tests covering all features that Python has. 

The Lib/test directory is actually a package, which means If we want to directly run the files in the Lib/test unchanged, first we need implement package import.

The second is the library functions and methods. we don't have exec or eval, which results in 122 files that we cannot interpret; we don't have format methods, which results in 111 files that we cannot interpret. The data is not very accurate as for some features like audio and image we do not have intention to test them(do we?), but we cannot miss *any* library function in order to run the tests. This really needs huge effort.

The third is modules. Built-in modules like os, sys, __main__, functools, etc. and modules like random, math, array are commonly used in the test.

Then the syntax: we do not support string format "%s" yet; we do not support "with" statement yet, etc.

I think there must be more that need to cover. I have an idea for the further development of our project: Let's use the modified tests in Python's test directly to test our interpreter. Since we have a working unittest module, we can just use the test case unchanged in Python's test to keep us alert on things that are still missing, targeting straightly to the Python's test!

Junsong Li

unread,
Apr 8, 2013, 11:05:31 PM4/8/13
to lamb...@googlegroups.com
> Junsong, do you have a particular idea of changes that you'd like to make/pieces you need to understand? What about modules needs information from these phases; is it just finding global identifiers, or something more subtle?

There are actually more. I am considering to solve the problem of "from A import *", which requires me to understand the phases. And __main__ module, which have a serious bug, requires me to understand the raise of globals. It is not bad for me to learn the scope things anyway.

Joe Gibbs Politz

unread,
Apr 15, 2013, 10:36:02 AM4/15/13
to lamb...@googlegroups.com
> Junsong - I'll give you a walk through of the phases after Wednesday.

Can we have this walk through on-list? I would benefit from having
some of those details go by me again, even if I can't read every
message in detail.

Jesse Millikan

unread,
Apr 17, 2013, 4:35:03 PM4/17/13
to lamb...@googlegroups.com
This is tangential, but do you know what source location information you want, and how you want it expressed in the AST? That would be the next step for me to take with the parser unless you have other plans.

Daniel Patterson

unread,
Apr 17, 2013, 6:22:26 PM4/17/13
to lamb...@googlegroups.com
Ragg provides them in the form of srclocs, I believe, and I think that
would be fine for us. Basically, we want them to be what racket likes,
so that if we ever make a #lang python, we can error highlight :)

(by the way, great job on this).

Jesse Millikan

unread,
Apr 23, 2013, 11:31:02 PM4/23/13
to lamb...@googlegroups.com
Thanks, though Ragg actually does the majority of the parsing work. Allowing drop-in of the Python grammar is awesome. 

Anyway, I pushed a skeleton of source position support (in the parser) to branch "parser-src-pos" since it affects get-structured-python.rkt and is being written a bit differently than hinted. Specifically, each AST node with a known source position is wrapped in another special AST with source position info. If no one objects, I'll merge back into master and do source positions in the parser for AST nodes other than "Module".
Reply all
Reply to author
Forward
0 new messages