Integration / Compatibility with c-based python modules and still being able to do optimisation

4 views
Skip to first unread message

lkcl

unread,
Apr 1, 2009, 8:18:58 AM4/1/09
to Unladen Swallow
folks, hi,

there's an important issue that i wanted to raise which has
implications for the unladen swallow project. i wanted to go over it
with you because you will encounter this issue soon enough, and there
is, i believe, only one solution. however, rather than just _tell_
you what i think it is, i'd like to solicit your input, through a
logical chain of questions, to see if you arrive independently at the
same conclusion, or, better, if you can come up with an alternate
solution. so _without_ telling you the solution, i'll begin by
describing the background, lead on from there, and then chip in as
things progress.

i did a proof-of-concept experiment last week using python-
spidermonkey which demonstrates that it's possible to run gtk
"helloworld.py" and a few other gtk demos, using the http://pyjs.org
python-to-javascript compiler and the python-spidermonkey library.

the way it works is:

* a function in python adds entire modules, on demand, to the
javascript context.
def pyjs_import_into_sm_context(module_name):
exec "import %s as actual_module" % module_name;
smcontext.add_global(actual_module, module_name)
* the "importing function" itself is added to the javascript context
as well:
smcontext.add_global(pyjs_import_into_sm_context,
"pyjs_import_into_sm_context")
* when the pyjs compiler encounters an "import" statement, javascript
code is generated which, duh, calls the import function:
pyjs_import_into_sm_context("gtk")

thanks to python-spidermonkey's ability to recognise python objects as
javascript objects, this all works swimmingly well.

however, where it all starts to go a bit wobbly is on encountering the
python basic types, such as int, long, float etc. well - it _doesn't_
go wobbly, straight away,...

the question to ask is: how do you deal with python c-based types,
PyInt_Type, PyLong_Type, PyFloat_Type, ..... string, complex, etc.

well, the simple solution is (in python-spidermonkey-terminology)
this. before starting the application, add the basic types to the
javascript context:

* smcontext.add_global("int", int) # yes, really add python int type
smcontext.add_global("long", long)
smcontext.add_global("str", str)
....
.... complex
.... bool
....

* modify the javascript compiler so that it churns out x.__add__(y)
instead of x + y etc.
x = 3
y = x + 5
==> javascript
x = int(3);
y = x.__add__(5);

again - running this actually works! amazingly, you can support the
_full_ range of python basic types by riding off the back of the
python runtime.

so now, with that background in mind, we come to the important bit.
here is the vital, vital, question:

how do you optimise the above scenario, where you're unavoidably
"engaged" with the python runtime in this unavoidable fashion,
assuming that you want full support for "c python modules"?

i'll illustrate with a series of questions.

1) what's the big deal? what you have there works, why is it a
problem?

answer: i don't _want_ to be solely and exclusively dependent on the
code from python/Objects/intobject.c python/Objects/longobject.c i
want a *javascript* implementation of int, a *javascript*
implementation of long, like this:

int = function(x) { this.x = x; }
int.prototype.__class__.__new__ = int;
int.prototype.__class__.__name__ = "int"; /* to be able to identify
object instances */
int.prototype.__add__(x) { return int(this.x + x); }
int.prototype.__new__(x) { return int(this.x * x); }
...
...

i want the JIT compiler to then do JIT-compilation of my javascript-
implementation of the "int" class, with _full_ interoperability
between a python c-based "int" class and my lovely-looking javascript
implementation.

having to get involved with python c-based "int" is therefore a
serious barrier to speed improvements and optimisations, and being
_forced_ to use the python c-based runtime is just... well... yuk.

2) ok - go for it, knock yourself out - what's stopping you?

absolutely nothing.... _other_ than the fact that there is only one
recognised "int" type, in the low-level python code.

3) err.... what?

PyInt_Type. see python/Objects/intobject.h:
PyAPI_DATA(PyTypeObject) PyInt_Type;

#define PyInt_Check(op) \
PyType_FastSubclass((op)->ob_type, Py_TPFLAGS_INT_SUBCLASS)

4) err... so what?

just creating something with the _name_ "class int" does NOT mean that
python c-based modules will recognise it as BEING an "int".

5) again - so what?

ok - do a walkthrough.

* you create an instance of a javascript class which is named "int".
* you then hand it to a python c-based module's function, e.g.
window.set_border_width()
* in the case of python-gtk c-based modules, it does this:
gobject_assert(PyInt_Check(object)), "function parameter is not an
integer");

the reason is simple: the javascript class which is named "int" is
_not_ of type PyType_Int - it's actually (by the time it gets into the
c-based module) of type PyType_Object (or something).

6) ah. i see. oh dear. yes, that's a bit of a downer. ok, so how
about this - what about forcing an automatic two-way conversion, so
that whenever you have something that's a "javascript" class of type
int, you convert it to a _python_ class of type int - etc.

sounds like a _great_ idea - let's do a walkthrough of the
implications.

* compiler creates code that creates javascript int instances
* in python-spidermonkey, get the javascript object's "__name__"
* if "__name__" == "int", then grab the javascript object's "x"
instance variable
* pass the "x" instance variable contents to PyInt_FromLong()
* return that as the python object.

sounds _great_! (and in fact this is exactly what is done, right
now.)

7) ok - great! so it's solved, right? there's no problem - if that's
what's going on, it works, right?

errrr... no. otherwise i wouldn't be raising this, here, would i? :)

the crucial issue is this: what happens when you transfer a complex
data structure such as a _class_ instance across the python-
spidermonkey boundary, between c-based modules and the javascript
execution engine?

what are the implications?

again - a walkthrough:

# code to be translated into javascript:
class foo:
def __init__(self):
self.x = int(5) # this will become a javascript "int" object
# NOT a python int object

* receive javascript object to be handed to a c-based module
* in python-spidermoney, get the javascript object's "__name__"
* the object is of type "foo" - let it pass.
* c-based module tries to access foo class "x" member variable
* calls PyInt_Check(object->x) and this check FAILS.

8) oh _god_. oh no - i'm beginning to get it.

yep. you need to do a TOTAL, complete, ONE HUNDRED PERCENT object
walk of the ENTIRE javascript object-space, doing a TOTAL
transformation of ALL basic python types, as "emulated" in javascript,
into "real" python objects.

not just ints, but strings, unicode strings, floats, complexs, bools,
longs, Dicts, Lists - absolutely everything.

and - worse - vice-versa! when the object comes back, from python to
javascript, you'd need to translate it the other way!

9) that's ... _horrific_

yep.

10) so... uh... what's the solution?

i don't know - well, i have some ideas, but i'm reluctant to voice
them right now - i wanted to hear what other people have to say before
raising them, in the hope that someone can come up with an alternative
idea.

whilst writing this, it _did_ occur to me that the javascript int
"class" (not that there is the concept of classes in javascript, but
you can emulate a very close approximation) - the javascript "int
object emulator" - could "inherit" from the _real_ python "int"
object. but, again, that really defeats the point of the exercise of
being able to do optimisation!


i hope that you can appreciate the implications for unladen/swallow,
pretty much simply by substituting "javascript" for "LLVM", in the
above.

this is an issue which you _will_ encounter (if you haven't already) -
it would be ridiculous to _always_ have to bind to the python c
runtime for int, long, str, unicode etc. on the _remote_ offchance
that someone _might_ use a c-based python module and end up passing
basic types to that c-based module. it makes a mockery of the whole
concept of doing an optimising JIT compiler.

so - gloves off: what are the possibilities?

bearing in mind that both projects face exactly the same barrier, they
will likely be able to take advantage of the same solution.

in particular, i'd like to point out that i _have_ noted the
possibility of modifying the "standard" python c source as something
that you consider to be acceptable (which is good). that's a hint,
btw :) i've said enough.

l.

lkcl

unread,
Apr 1, 2009, 2:12:00 PM4/1/09
to Unladen Swallow
ok - i didn't realise that you were going for modifying _the_ python c
implementation - i thought you were starting from that cython thing.
perhaps that should be on the FAQ and made clear on the web page "we
are not starting from jython, cython, but from _the_ "original" c-
based python implementation".

so, my apologies for the misunderstanding!

ok.

so.

in that case, things like the use of PyInt_Type, in c rather than in
LLVM, might actually be acceptable to you - i don't know.

(it's certainly not acceptable for a javascript-based accelerator to
have that kind of "cross-over" for _every_ single basic data type
instance).

please do let me know, because if it's acceptable to you - that
accelerated apps always call and always use PyInt_Type, PyLong_Type
etc. etc. and always use the standard implementations of same , in
intobject.c, longobject.c etc. then i'm sorry to have taken up your
time with such a long explanation, i'll go away and make the
modifications i believe are needed, without involving you in the
details.

but if not, then please do consider "playing along", to brain-storm
the options and possible solutions.

l.

Fredrik Lundh

unread,
Apr 1, 2009, 2:45:48 PM4/1/09
to luke.l...@googlemail.com, Unladen Swallow
On Wed, Apr 1, 2009 at 8:12 PM, lkcl <luke.l...@googlemail.com> wrote:
>
> ok - i didn't realise that you were going for modifying _the_ python c
> implementation - i thought you were starting from that cython thing.
> perhaps that should be on the FAQ and made clear on the web page "we
> are not starting from jython, cython, but from _the_ "original" c-
> based python implementation".

CPython is the standard name for the main implementation.

http://wiki.python.org/moin/CPython
http://en.wikipedia.org/wiki/CPython

and that name has been used for as long as there has been multiple
implementations; see e.g.

http://jython.sourceforge.net/docs/differences.html

If you check the project site/page/faq again, you'll see that it's
pretty clear that we're building on CPython 2.6 and not something
else.

(but maybe naming the Pyrex fork Cython when CPython was in wide use
wasn't such a good idea ;-)

</F>

Collin Winter

unread,
Apr 1, 2009, 3:00:58 PM4/1/09
to luke.l...@googlemail.com, Unladen Swallow
On Wed, Apr 1, 2009 at 11:12 AM, lkcl <luke.l...@googlemail.com> wrote:
>
>
> ok - i didn't realise that you were going for modifying _the_ python c
> implementation - i thought you were starting from that cython thing.
> perhaps that should be on the FAQ and made clear on the web page "we
> are not starting from jython, cython, but from _the_ "original" c-
> based python implementation".
>
> so, my apologies for the misunderstanding!

That intention is already in the documentation, in particular in
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Overview and
in the first question on
http://code.google.com/p/unladen-swallow/wiki/FAQ. I'm open to
suggestions of how to make our intention in this regard more obvious.

> in that case, things like the use of PyInt_Type, in c rather than in
> LLVM, might actually be acceptable to you - i don't know.
>
> (it's certainly not acceptable for a javascript-based accelerator to
> have that kind of "cross-over" for _every_ single basic data type
> instance).
>
> please do let me know, because if it's acceptable to you - that
> accelerated apps always call and always use PyInt_Type, PyLong_Type
> etc. etc. and always use the standard implementations of same , in
> intobject.c, longobject.c etc. then i'm sorry to have taken up your
> time with such a long explanation, i'll go away and make the
> modifications i believe are needed, without involving you in the
> details.

It will acceptable for extension modules to reuse those types and for
any initial interpreter to use those types, but eventually we want to
unbox the Python types for normal Python code, probably as an LLVM
optimization pass.

Collin

lkcl

unread,
Apr 5, 2009, 1:58:19 PM4/5/09
to Unladen Swallow


On Apr 1, 7:00 pm, Collin Winter <collinwin...@google.com> wrote:
> On Wed, Apr 1, 2009 at 11:12 AM, lkcl <luke.leigh...@googlemail.com> wrote:
>
> > ok - i didn't realise that you were going for modifying _the_ python c
> > implementation - i thought you were starting from that cython thing.
> > perhaps that should be on the FAQ and made clear on the web page "we
> > are not starting from jython, cython, but from _the_ "original" c-
> > based python implementation".
>
> > so, my apologies for the misunderstanding!
>
> That intention is already in the documentation, in particular inhttp://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Overviewand
> in the first question onhttp://code.google.com/p/unladen-swallow/wiki/FAQ. I'm open to
> suggestions of how to make our intention in this regard more obvious.

yeh, i realise that, now - but it hadn't sunk in - at all. i thought
you were going for an optimised version of "Cython" that would then
have to be made "interoperable" with python 2.6.1 c-based modules.

Q: Why branch CPython? Why not use Jython, IronPython or PyPy?

should be:

Q: Why branch CPython (the original and most popular version Python)?
Why not use Cython, Jython, IronPython or PyPy?


> > please do let me know, because if it's acceptable to you - that
> > accelerated apps always call and always use PyInt_Type, PyLong_Type
> > etc. etc. and always use the standard implementations of same , in
> > intobject.c, longobject.c etc. then i'm sorry to have taken up your
> > time with such a long explanation, i'll go away and make the
> > modifications i believe are needed, without involving you in the
> > details.
>
> It will acceptable for extension modules to reuse those types and for
> any initial interpreter to use those types, but eventually we want to
> unbox the Python types for normal Python code, probably as an LLVM
> optimization pass.

ok - brilliant.

(btw, sorry for the delay in responding - my baby's been born, so as
you might imagine, i've been busy!)

ok - so, that's good to hear - it means you've thought about these
things and the implications, so if i start describing options it won't
come as a shock to the system :)

from the "javascript-as-optimiser" perspective, there really is very
little that i need to do to "develop" the JS-optimisation side,
whereas you guys unfortunately have one _hell_ of a lot of work to
do :)

if you don't mind, i'll assume that it's ok to continue.

in the brief moments that i've thought about this stuff over the last
few days, the options that occur to me are these:

* keep the exact same data structures of the basic types, and turn
basic type implementations into dynamically loadable modules (making
damn sure that it's possible to use "default" existing behaviour i.e.
intobject.c unmodified). [this should be dead-easy to do, because
e.g. intobject.c is already a vector table of pointers to functions
and so providing a means to change these pointers-to-functions to
point... somewhere else is a trivial matter].

add a low-level interface which allows direct access to the PyObject
data structure members. e.g. direct access to ob_ival of
PyIntObject. from intobject.h:




* keep the current macros the same but have a #ifdef compilation
option which switches in alternate (dynamic-loadable)
implementations. e.g.

/* Macro, trading safety for speed */
#define PyInt_AS_LONG(op) (((PyIntObject *)(op))->ob_ival)

becomes:

#ifdef STANDARD_PYTHON
#define PyInt_AS_LONG(op) (((PyIntObject *)(op))->ob_ival)
#else /* LLVM / Modular-Architecture */
#define PyInt_AS_LONG(op) (((PyIntObject *)(op))->ops->get_ob_ival())
#endif

where ops is a pointer to the dynamic-loaded vector-table ... or
perhaps this:

#define PyInt_AS_LONG(op) (PyIntDynamicLoadedOps->get_ob_ival(op))

then, the intobject.h data structure can become:

typedef struct {
PyObject_HEAD
void* ob_ival;
} PyIntObject;

where ob_ival can be typecast to a long, in the case of the "default"
behaviour:

#ifdef STANDARD_PYTHON
#define PyInt_AS_LONG(op) ((long)((PyIntObject *)(op))->ob_ival)
#else /* LLVM / Modular-Architecture */
...
#endif


under these circumstances, where PyIntObject now stores a pointer to
an opaque type, the dynamic-module-architecture has a means to place
an object of a type of its own choosing into that pointer (including,
in the case of an int, just using the void* _to_ store an int).

in the case of the javascript-dynamic-module-for-integers, it would be
a pointer to a spidermonkey (or V8) data structure.

i can then implement "int" in python, have it converted to javascript,
and have a dynamic-module-for-integers that "grabs" the integer out of
the int "class".

for the LLVM implementation, you guys would be free to perform similar
tricks.

... y'know what - it occurs to me that the pypy folks should be
involved, here, because they could conceivably leverage the same
techniques.

it's particularly ironic to think that the best way for optimisation
to progress is being considered by returning to python implementations
of basic data types, just as it was for python 1.5's implementation of
e.g. the "long" data type.

l.

lkcl

unread,
Apr 5, 2009, 6:25:41 PM4/5/09
to Unladen Swallow
> add a low-level interface which allows direct access to the PyObject
> data structure members. e.g. direct access to ob_ival of
> PyIntObject. from intobject.h:

typedef struct {
PyObject_HEAD
long ob_ival;

} PyIntObject;

whoops, sorry, i cut out that bit of context, there, in cut/paste
haste to paste it further down when suggesting ob_ival be made a
void*. ILP rules mean that sizeof(I) <= sizeof(L) <= sizeof(P) so
it's perfectly ok to have a long typecast to a pointer but not the
other way round. fortunately it's the long that needs to be stored not
a pointer.

l.

Flier Lu

unread,
Apr 8, 2009, 12:17:51 PM4/8/09
to Unladen Swallow
I'm not sure what's the original issues that you want to resolve, but
in my opinion, the design directly depend on the mapping strategy
between the javascript and python type system.

For example, in the PyV8 implementation, I choose an implicit
conversion strategy for the build-in types.

>>> import PyV8
>>> ctxt = PyV8.JSContext() # create a context with an implicit global object
>>> ctxt.enter() # enter the context (also support with statement)
>>> ctxt.eval("1+2") # evalute the javascript expression
3 # return a
native python int

It means, the build-in types in both system, such as int, float and
datetime etc, will be automatic convert between javascript and python.
You could direct pass a python string to javascript function, and
return a native python int

>>> import PyV8
>>> ctxt = PyV8.JSContext()
>>> ctxt.enter()
>>> func = ctxt.eval("function (str) { return str.length; }")
>>> func(["hello"]) # I will fix it soon, just use list as function parameters now
5
>>> type(func(["hello"]))
<type 'int'>

on the other hand, for the javascript Array object, PyV8 use __len__/
__getitem__/__setitem__/__delitem__/__iter__ methods set to simulate
the python container type, and vice versa

so the user could use javascript/python object as first class type in
python/javascript.

besides, the user defined javascript/python class will be wrapped by
CJavascriptObject/CPythonObject, you could check Wrapper.h/.cpp for
more detail

http://code.google.com/p/pyv8/source/browse/trunk/src/Wrapper.cpp
Reply all
Reply to author
Forward
0 new messages