folks, hi,
there's an important issue that i wanted to raise which has
implications for the unladen swallow project. i wanted to go over it
with you because you will encounter this issue soon enough, and there
is, i believe, only one solution. however, rather than just _tell_
you what i think it is, i'd like to solicit your input, through a
logical chain of questions, to see if you arrive independently at the
same conclusion, or, better, if you can come up with an alternate
solution. so _without_ telling you the solution, i'll begin by
describing the background, lead on from there, and then chip in as
things progress.
i did a proof-of-concept experiment last week using python-
spidermonkey which demonstrates that it's possible to run gtk
"helloworld.py" and a few other gtk demos, using the
http://pyjs.org
python-to-javascript compiler and the python-spidermonkey library.
the way it works is:
* a function in python adds entire modules, on demand, to the
javascript context.
def pyjs_import_into_sm_context(module_name):
exec "import %s as actual_module" % module_name;
smcontext.add_global(actual_module, module_name)
* the "importing function" itself is added to the javascript context
as well:
smcontext.add_global(pyjs_import_into_sm_context,
"pyjs_import_into_sm_context")
* when the pyjs compiler encounters an "import" statement, javascript
code is generated which, duh, calls the import function:
pyjs_import_into_sm_context("gtk")
thanks to python-spidermonkey's ability to recognise python objects as
javascript objects, this all works swimmingly well.
however, where it all starts to go a bit wobbly is on encountering the
python basic types, such as int, long, float etc. well - it _doesn't_
go wobbly, straight away,...
the question to ask is: how do you deal with python c-based types,
PyInt_Type, PyLong_Type, PyFloat_Type, ..... string, complex, etc.
well, the simple solution is (in python-spidermonkey-terminology)
this. before starting the application, add the basic types to the
javascript context:
* smcontext.add_global("int", int) # yes, really add python int type
smcontext.add_global("long", long)
smcontext.add_global("str", str)
....
.... complex
.... bool
....
* modify the javascript compiler so that it churns out x.__add__(y)
instead of x + y etc.
x = 3
y = x + 5
==> javascript
x = int(3);
y = x.__add__(5);
again - running this actually works! amazingly, you can support the
_full_ range of python basic types by riding off the back of the
python runtime.
so now, with that background in mind, we come to the important bit.
here is the vital, vital, question:
how do you optimise the above scenario, where you're unavoidably
"engaged" with the python runtime in this unavoidable fashion,
assuming that you want full support for "c python modules"?
i'll illustrate with a series of questions.
1) what's the big deal? what you have there works, why is it a
problem?
answer: i don't _want_ to be solely and exclusively dependent on the
code from python/Objects/intobject.c python/Objects/longobject.c i
want a *javascript* implementation of int, a *javascript*
implementation of long, like this:
int = function(x) { this.x = x; }
int.prototype.__class__.__new__ = int;
int.prototype.__class__.__name__ = "int"; /* to be able to identify
object instances */
int.prototype.__add__(x) { return int(this.x + x); }
int.prototype.__new__(x) { return int(this.x * x); }
...
...
i want the JIT compiler to then do JIT-compilation of my javascript-
implementation of the "int" class, with _full_ interoperability
between a python c-based "int" class and my lovely-looking javascript
implementation.
having to get involved with python c-based "int" is therefore a
serious barrier to speed improvements and optimisations, and being
_forced_ to use the python c-based runtime is just... well... yuk.
2) ok - go for it, knock yourself out - what's stopping you?
absolutely nothing.... _other_ than the fact that there is only one
recognised "int" type, in the low-level python code.
3) err.... what?
PyInt_Type. see python/Objects/intobject.h:
PyAPI_DATA(PyTypeObject) PyInt_Type;
#define PyInt_Check(op) \
PyType_FastSubclass((op)->ob_type, Py_TPFLAGS_INT_SUBCLASS)
4) err... so what?
just creating something with the _name_ "class int" does NOT mean that
python c-based modules will recognise it as BEING an "int".
5) again - so what?
ok - do a walkthrough.
* you create an instance of a javascript class which is named "int".
* you then hand it to a python c-based module's function, e.g.
window.set_border_width()
* in the case of python-gtk c-based modules, it does this:
gobject_assert(PyInt_Check(object)), "function parameter is not an
integer");
the reason is simple: the javascript class which is named "int" is
_not_ of type PyType_Int - it's actually (by the time it gets into the
c-based module) of type PyType_Object (or something).
6) ah. i see. oh dear. yes, that's a bit of a downer. ok, so how
about this - what about forcing an automatic two-way conversion, so
that whenever you have something that's a "javascript" class of type
int, you convert it to a _python_ class of type int - etc.
sounds like a _great_ idea - let's do a walkthrough of the
implications.
* compiler creates code that creates javascript int instances
* in python-spidermonkey, get the javascript object's "__name__"
* if "__name__" == "int", then grab the javascript object's "x"
instance variable
* pass the "x" instance variable contents to PyInt_FromLong()
* return that as the python object.
sounds _great_! (and in fact this is exactly what is done, right
now.)
7) ok - great! so it's solved, right? there's no problem - if that's
what's going on, it works, right?
errrr... no. otherwise i wouldn't be raising this, here, would i? :)
the crucial issue is this: what happens when you transfer a complex
data structure such as a _class_ instance across the python-
spidermonkey boundary, between c-based modules and the javascript
execution engine?
what are the implications?
again - a walkthrough:
# code to be translated into javascript:
class foo:
def __init__(self):
self.x = int(5) # this will become a javascript "int" object
# NOT a python int object
* receive javascript object to be handed to a c-based module
* in python-spidermoney, get the javascript object's "__name__"
* the object is of type "foo" - let it pass.
* c-based module tries to access foo class "x" member variable
* calls PyInt_Check(object->x) and this check FAILS.
8) oh _god_. oh no - i'm beginning to get it.
yep. you need to do a TOTAL, complete, ONE HUNDRED PERCENT object
walk of the ENTIRE javascript object-space, doing a TOTAL
transformation of ALL basic python types, as "emulated" in javascript,
into "real" python objects.
not just ints, but strings, unicode strings, floats, complexs, bools,
longs, Dicts, Lists - absolutely everything.
and - worse - vice-versa! when the object comes back, from python to
javascript, you'd need to translate it the other way!
9) that's ... _horrific_
yep.
10) so... uh... what's the solution?
i don't know - well, i have some ideas, but i'm reluctant to voice
them right now - i wanted to hear what other people have to say before
raising them, in the hope that someone can come up with an alternative
idea.
whilst writing this, it _did_ occur to me that the javascript int
"class" (not that there is the concept of classes in javascript, but
you can emulate a very close approximation) - the javascript "int
object emulator" - could "inherit" from the _real_ python "int"
object. but, again, that really defeats the point of the exercise of
being able to do optimisation!
i hope that you can appreciate the implications for unladen/swallow,
pretty much simply by substituting "javascript" for "LLVM", in the
above.
this is an issue which you _will_ encounter (if you haven't already) -
it would be ridiculous to _always_ have to bind to the python c
runtime for int, long, str, unicode etc. on the _remote_ offchance
that someone _might_ use a c-based python module and end up passing
basic types to that c-based module. it makes a mockery of the whole
concept of doing an optimising JIT compiler.
so - gloves off: what are the possibilities?
bearing in mind that both projects face exactly the same barrier, they
will likely be able to take advantage of the same solution.
in particular, i'd like to point out that i _have_ noted the
possibility of modifying the "standard" python c source as something
that you consider to be acceptable (which is good). that's a hint,
btw :) i've said enough.
l.