Github update

14 views

Skip to first unread message

Christian Iversen

unread,

Aug 7, 2013, 8:55:40 PM8/7/13

to pyj...@googlegroups.com

Hello World

Dear leader checking in again. I just wanted to update you all on the
progress.

I have made a number of local feature branches in my git repos:

ist/attr
ist/call
ist/cli
* ist/compiler
ist/func
ist/hash

The branch ist/compiler represents the new compiler devel branch. Each
of the other branches is a feature branch, implementing a specific new
feature or refactor.

Samuel (neppord) helped me save myself from a git mistake, and taught me
how to use rebase. So now I've got everything cleaned up, and have
fast-forwarded _everything_ into ist/compiler. I've also cleaned up the
branches on github, and pushed everything.

So now the status is (github might be off by a few tests, have a few
unpushed commits):

Ran 246 tests in 7.837s
OK

We have 23 disabled known-to-fail tests right now. However, in the new
model, we should be able to _solve_ these problems. A lot of them have
to do with little quirks in the compiler or stdlib that used to be
really hard. In the new compiler and infrastructure, they can be solved!

So if you want to take a look, go to github and check out ist/compiler now!

And for the really interested, here's a short recap of what the branches do:

## ist/attr ##
Attribute access used to be done so that

obj.attr

was written into

obj.PY$__getattr__("attr")

But this was problematic for several reasons. First of all, it allows
any object to _completely_ override the compilers sense of semantics,
and secondly it's not how python does it. After learning about
__getattribute__ and descriptor semantics, I decided to wrap it
differently. Now, we call:

$PY.getattr(obj, "attr")

This allows the semantics to be uniform, and implement complete
descriptor semantics for even very-advanced things like properties, true
static methods, custom descriptors, and (later) metaclasses.

## ist/call ##

Another area of interest was argument passing. Earlier we used a quite
clumsy argument parsing. For the function

def foo(x):
print x

This was the argsparse preamble:

var __kwargs = __kwargs_get(arguments);
var __varargs = __varargs_get(arguments);
var $v1 = Array.prototype.slice.call(arguments).concat(js(__varargs));
var x = ('x' in __kwargs) ? __kwargs['x'] : $v1[0];

And things only got worse when using *args or **kwargs.

Now, for the same function, we have:

var $pyargs = __uncook(arguments);
var $v1 = $pyargs.varargs;
var x = ($pyargs.kw.x || $v1[0]);
delete $pyargs.kw.x;

You might say that this is still 4 lines. But

A) It's simpler
B) It's faster
C) It's easier to read
D) It has several corner case bugfixes
E) It's even better with more than 1 arg
F) It isn't even optimized fully yet :)

## ist/hash ##

Hashing is something that is tricky. Python requires certain things
about hashes, but implementing _identical_ behavior to CPython is
neither practical nor required. Looking at PyPy, it's clear that they
have a fully-working hashing algorithm, but it produces results that are
often different from CPython's. So now ours uses something that is
similar to CPython, but simplified for our target platform while still
working correctly.

Also, I have made sure that all hash values are real int-objects, which
wasn't always the case before.

## ist/cli ##

I often found myself wanting to test just a line or two of code, and
writing it into a file and then compiling and running that file was
quite a chore. So, I created a command line frontend for Pyjaco. It aims
to be similar to "python" (CPython) and "pypy" (PyPy), and it really
feels similar:

Python 2.x (pyjaco)
[google v8] on Linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print " ".join([s.upper() for s in ["pyjaco", "is", "awesome"]])
PYJACO IS AWESOME

I also made sure that each of those 4 functions/strings now work like in
CPython. They didn't, earlier.

## ist/func ##

Finally, the grand stdlib rewrite.

Another big problem, that has haunted me since the 1.x days, is the fact
that function generation used to be context-specific. Methods and
functions did not parse their arguments in the same way, because methods
got an implicit 1st argument (this) that functions didn't. And so, if a
function switched context during execution (by being assigned to a
class, wrapped in a decorator, or similar) it would fail the parameter
parsing because it either still expected "this", or still didn't expect it.

That's all fixed now! In the new model, called "purely pythonic args",
you always explicitly give each python argument as a positional argument
in javascript. That does occasionally lead to some funny-looking code in
the stdlib, things like:

return x.PY$__eq__(x, y)

Whereas before you would simply write

return x.PY$__eq__(y)

But the advantages are many. For example, calling a class method is now
super easy:

return object.PY$__eq__(x, y)

And everything related to decorators, functions and methods now works
correctly. I still have many ideas for optimizations, that will allow us
to go futher than before. But that is yet to be implemented (patches
always welcome) :)

One small problem in this new model is the fact that repeating the first
argument is simply not valid if the evaluation of that argument has side
effects. Consider the following (pathological) example:

print "New count is:", increment_counter()

Now, if increment_counter() is suddenly called twice, the results will
be wrong. The compiler handles this by using a $PY.call function, like
so (in pseudo-code):

$PY.call(obj, func, *args)

This will call

obj[func](obj, *args)

So no double-evaluation of the first argument takes place. In the
compiler, this is handled in the "purecall" method. It even
special-cases simple cases where it's easier (and safe) to just repeat
the value.

So that's it for now. If you have time, go play with

--
Med venlig hilsen
Christian Iversen

Reply all

Reply to author

Forward

0 new messages