Thinking about PyPy

518 views
Skip to first unread message

Timothy Baldridge

unread,
May 23, 2012, 3:09:43 PM5/23/12
to clojure...@googlegroups.com
Recently I've been putting allot of thought into the Clojure-Py
compiler and what my goals are with it. I guess to start with I should
explain the three viewpoints of what we should be doing with
Clojure-Py

1) ClojureScript integration - The devs of ClojureScript have asked
several times why I wrote a custom compiler instead of adapting
ClojureScript. The reason I've given is that I want native macros, and
I don't want to have Java to be able to run Clojure code. However, we
could at some point get ClojureScript to compile on Clojure-Py then
we'd have a self hosting compiler.

2) The Python gurus think it's cool that Clojure-Py runs directly on
the PythonVM. It doesn't need anything besides about 10KB worth of
compressed .py files. It runs anywhere CPython does.

3) The PyPy devs have asked "why didn't you just write it as a PyPy
Module". They are right, if we want excellent performance (performance
that could surpass that of Clojure on the JVM I think) then writing
Clojure-Py in RPython is the way to go.


It's this last item that interests me the most let me explain (for
those who don't know how PyPy works) what this would involve.

1) We'd adapt all of clojure/lang to run in RPython (restricted python).
2) We'd dump the compiler. In it's place we'd write a Clojure
interpreter. So, executing interpreter/eval on a ISeq would execute
that sequence. Against a vector, it would construct a vector, etc.
3) We annotate the source of this interpreter with a few hints for PyPy
4) The PyPy interpreter creates a JIT specifically for Clojure code.

With this approach, we would loose support for CPython, but gain a
massive boost in speed.

For instance, examine this code:

(def ^dynamic foo 0)

(binding [foo 1]
(dotimes [x 10000]
(println (str foo x))))


In this code, every time we lookup the value of foo, Clojure-JVM will
deref the var. This causes a search in the hashmap for the
threadLocalBindings of vars. This happens for every single deref. Not
so in Clojure on PyPy. In this case, after the first 1000 interations,
the JIT would auto-inline the lookup of foo. From then on it would
just do a single assert each time to make sure that the var lookup
hashmap hasn't changed.

Now in the above example this isn't much of a speed boost, but here's
the deal. The JIT of PyPy tries to find immutable data structures, and
when it does, it starts inlining code, and performing constant
propagation. Well Clojure is almost 100% immutable, so we're talking
about some insane speedups.

So, if we included Clojure as a PyPy module, then users would be
forced to download PyPy, but they could still use Python modules,
ctypes, etc. From Clojure-Py. So what I'm debating is if all this is
worth it? Do you think we'd alienate too many people by going to a
PyPy only model?

From what I'm reading of the PyPy devs, if we don't bloat the PyPy
code too much, they might be willing to just include Clojure-Py in the
stock PyPy interpreter. So then we could actually tell people "hey,
download the latest PyPy and you have clojure...."

I'm looking for thoughts/ideas here. Originally I started down the
PyPy only path, but didn't understand Clojure well enough at the time
to feel that I would make the right decisions. At this point, I think
we could pull it off.


These videos go into this in much more depth:

http://pyvideo.org/video/662/how-the-pypy-jit-works
http://pyvideo.org/video/661/why-pypy-by-example
http://pyvideo.org/video/612/how-to-get-the-most-out-of-your-pypy


Timothy Baldridge

Eric Shull

unread,
May 23, 2012, 3:40:18 PM5/23/12
to clojure...@googlegroups.com
It sounds like it's worth doing, but it also sounds like a fork of Clojure-Py.

Gary Poster

unread,
May 23, 2012, 5:12:04 PM5/23/12
to clojure...@googlegroups.com
On 05/23/2012 03:09 PM, Timothy Baldridge wrote:
> I'm looking for thoughts/ideas here. Originally I started down the
> PyPy only path, but didn't understand Clojure well enough at the time
> to feel that I would make the right decisions. At this point, I think
> we could pull it off.

From the perspective of an interested bystander, the PyPy approach seems
compelling to me. PyPy's speed would be a big selling point, and PyPy
is hopefully starting to enter the mainstream, in my perception. I took
the recent re-inclusion into Debian Universe as a positive sign.

Further, as someone naive to the ClojureScript implementation, is there
some way to combine the ClojureScript approach with the PyPy approach?
Have a minimal subset of Clojure in PyPy, with the rest in ClojureScript?

Gary

Antony Lee

unread,
May 23, 2012, 5:12:24 PM5/23/12
to clojure...@googlegroups.com
Not sure why this would mean dropping CPython support.  A Clojure interpreter running on CPython would probably be awfully slow (like untranslated PyPy), but it's not clear to me that it would be (much) slower than the current byteplay-based solution.
Anyways, looks very interesting to me.
Antony

2012/5/23 Eric Shull <eric....@gmail.com>

Timothy Baldridge

unread,
May 23, 2012, 5:38:22 PM5/23/12
to clojure...@googlegroups.com
> Not sure why this would mean dropping CPython support.  A Clojure
> interpreter running on CPython would probably be awfully slow (like
> untranslated PyPy), but it's not clear to me that it would be (much) slower
> than the current byteplay-based solution.

Well it's another layer of abstraction. So currently Clojure-Py
translates functions into the same bytecode that Python uses. From
there Python's VM thinks they are just normal Python functions. With
this approach we'd basically have an interpreter running on top of an
interpreter. PyPy does this same sort of thing for debugging. You can
bootup Python on top of Python (hence the name). Granted it takes
about 30sec to get to the repl, but it works.

My concern with ClojureScript is that PyPy is going to be a totally
different beast. For instance in this example: this
http://hasandiwan.info/2010/10/how-to-interpret-lisp-in-python.html

In that link we see that we can see that we don't even really need a
compiler to implement a lisp interpreter. In Lisp, code is data, and
data is code, so why compile the source forms into bytecode? This is
the beauty of PyPy, we don't need a compiler, and therefore
ClojureScript is mostly useless to us.

Anyway, I have a 4 day weekend this week (Memorial Day here in the
US), so I'll start hacking on this a bit and see where I get.

Timothy

Ulrich Küttler

unread,
May 23, 2012, 5:59:07 PM5/23/12
to clojure...@googlegroups.com

Clojure has always been a compiled language. It has always been a language that aims for popular platforms. These are major points for a cpython implementation.

I doubt that execution speed is the main concern of many py-clojure users.

With cpython py-clojure can be used in places where there is no java. If there is no py-clojure, people will start to learn java. I did.

Hat being said, PyPy is an exciting beast. It is just that a PyPy module will have less impact compared to a cpython version.

Ulrich

Am 23.05.2012 21:09 schrieb "Timothy Baldridge" <tbald...@gmail.com>:

Adam Feuer

unread,
May 23, 2012, 11:27:06 PM5/23/12
to clojure...@googlegroups.com
Tim,

I like the fact that I can use Clojure-py where ever I use CPython.

I also like the simplicity and speed that would come with the PyPy strategy.

When it comes down to it, I would download and use PyPy to use
Clojure-py. I want a Clojure that can use Python modules, and running
on PyPy would do that.

-adam
--
Adam Feuer <ad...@pobox.com>

John Gabriele

unread,
May 24, 2012, 2:03:42 AM5/24/12
to clojure...@googlegroups.com
On Wed, May 23, 2012 at 3:09 PM, Timothy Baldridge <tbald...@gmail.com> wrote:
>
> I'm looking for thoughts/ideas here.

Hi Timothy,

Here's what I'd most like to see clojure-py provide:

1. access to existing Python libs
2. reasonably fast start-up time
3. reasonably simple system

I see that you're particularly interested in performance. Also, you
mentioned you'd rather not depend upon or require Java
(understandable).

Do you see any way to have a small, simple, high-performance Clojure
system plus a bridge to talk to Python libs? Or maybe some way to
embed CPython in such a Clojure system?

---John

Antony Lee

unread,
May 24, 2012, 2:55:58 AM5/24/12
to clojure...@googlegroups.com
Right now clojure-py can already use more or less any Python module.  For example, the following code does what you'd expect (using the latest master branch):

(require '[PyQt4.QtGui :as qt])
(let [app (qt/QApplication (py/list [])), w (qt/QWidget)]
  (.setWindowTitle w "abc")
  (.show w)
  (sys/exit (.exec_ app)))

(actually it segfaults when I close the window, but I guess that shouldn't be too hard to fix :-))

The startup time is not too bad either, at ~3.5s for the REPL on my laptop (at least when you compare with clojure-jvm, at ~2.8s, as it suffers from the startup time of the (server) jvm itself).

Antony

2012/5/23 John Gabriele <jmg...@gmail.com>

Konrad Hinsen

unread,
May 24, 2012, 4:11:37 AM5/24/12
to clojure...@googlegroups.com
--On 23 mai 2012 14:09:43 -0500 Timothy Baldridge <tbald...@gmail.com>
wrote:

> So, if we included Clojure as a PyPy module, then users would be
> forced to download PyPy, but they could still use Python modules,
> ctypes, etc. From Clojure-Py. So what I'm debating is if all this is
> worth it? Do you think we'd alienate too many people by going to a
> PyPy only model?

PyPy runs Python modules, but the vast majority of "extension modules"
(compiled modules written in C, C++, Cython...) are not compatible with
PyPy at this time. For some Python applications (in particular scientific
computing, which is what I know best), this makes PyPy next to useless. The
huge collection of extension modules, many of which are interfaces to
libraries written in C or Fortran, is one of the main reasons why
scientists use Python.

PyPy support for CPython extension modules has improved over time, but does
not seem to be the PyPy developpers' priority.


I wonder how much effort it would be to aim for several goals in parallel.
Writing the support code in RPython doesn't make it unsuitable for CPython.
If we had both an interpreter layer for PyPy *and* a compiler to Python
bytecode, both sharing much of the support code, we could cater for both
the CPython platform and the PyPy platform. The compiler could even be
based on ClojureScript, to satisfy everyone.

Konrad.

Antony Lee

unread,
May 24, 2012, 4:26:14 AM5/24/12
to clojure...@googlegroups.com


2012/5/24 Konrad Hinsen <google...@khinsen.fastmail.net>

--On 23 mai 2012 14:09:43 -0500 Timothy Baldridge <tbald...@gmail.com> wrote:

So, if we included Clojure as a PyPy module, then users would be
forced to download PyPy, but they could still use Python modules,
ctypes, etc. From Clojure-Py. So what I'm debating is if all this is
worth it? Do you think we'd alienate too many people by going to a
PyPy only model?

PyPy runs Python modules, but the vast majority of "extension modules" (compiled modules written in C, C++, Cython...) are not compatible with PyPy at this time. For some Python applications (in particular scientific computing, which is what I know best), this makes PyPy next to useless. The huge collection of extension modules, many of which are interfaces to libraries written in C or Fortran, is one of the main reasons why scientists use Python.
Have you ever tried the "CPython embedding hack" (http://morepypy.blogspot.com/2011/12/plotting-using-matplotlib-from-pypy.html)?  It's been on my todo-list for a while but with only 2Go of RAM compiling pypy takes forever on my laptop...  Anyways, if it works (modulo a couple of wrappers to make the experience easier, but that's probably not the hardest part to do) then this would be great :-)

PyPy support for CPython extension modules has improved over time, but does not seem to be the PyPy developpers' priority.


I wonder how much effort it would be to aim for several goals in parallel. Writing the support code in RPython doesn't make it unsuitable for CPython. If we had both an interpreter layer for PyPy *and* a compiler to Python bytecode, both sharing much of the support code, we could cater for both the CPython platform and the PyPy platform. The compiler could even be based on ClojureScript, to satisfy everyone.
Yes, having the support code in RPython seems reasonable.

Konrad.

Antony

Konrad Hinsen

unread,
May 24, 2012, 3:17:07 PM5/24/12
to clojure...@googlegroups.com
Antony Lee writes:

> Have you ever tried the "CPython embedding hack"
> (http://morepypy.blogspot.com/2011
> /12/plotting-using-matplotlib-from-pypy.html)?  It's been on my
> todo-list for a while but with only 2Go of RAM compiling pypy takes
> forever on my laptop...  Anyways, if it works (modulo a couple of

I have 4 GB, I don't know if that's enough. Until now, I have only run
PyPy from precompiled binaries. But that hack looks interesting, maybe
I will give my CPU a workout ;-)

Konrad.

John Gabriele

unread,
May 24, 2012, 4:41:15 PM5/24/12
to clojure...@googlegroups.com
On Wed, May 23, 2012 at 5:38 PM, Timothy Baldridge <tbald...@gmail.com> wrote:
>
> My concern with ClojureScript is that PyPy is going to be a totally
> different beast. For instance in this example: this
> http://hasandiwan.info/2010/10/how-to-interpret-lisp-in-python.html
>
> {snip} This is
> the beauty of PyPy, we don't need a compiler, and therefore
> ClojureScript is mostly useless to us.
>
> Anyway, I have a 4 day weekend this week (Memorial Day here in the
> US), so I'll start hacking on this a bit and see where I get.

Timothy,

Sounds like you've been bitten by the PyPy bug and probably should see
where that leads. :)

Perhaps even create a 2nd project; say, "clojure-pypy".

That way, if a user wants access to CPython extension modules and no
additional dependencies outside of CPython, then maybe clojure-py is
their best bet.

And if a user wants all-out maximum performance, and doesn't mind a
more complex system and having to install PyPy, then maybe
clojure-pypy is what they're looking for.

---John

Antony Lee

unread,
May 24, 2012, 7:40:29 PM5/24/12
to clojure...@googlegroups.com
What do you mean by "a more complex system"?  It *may* be more complex from a programming POV (not even clear IMHO -- if the compiler starts emitting an AST with treadle instead of bytecode then we "just" need to write an AST interpreter) but the experience for the end user should be the same (if they don't require C interfacing, OR if the C interfacing hack can be made simpler -- but again that is probably just a matter of writing an import hook).
Antony

2012/5/24 John Gabriele <jmg...@gmail.com>

John Gabriele

unread,
May 24, 2012, 11:03:31 PM5/24/12
to clojure...@googlegroups.com
On Thu, May 24, 2012 at 7:40 PM, Antony Lee <anton...@berkeley.edu> wrote:
>
>> Perhaps even create a 2nd project; say, "clojure-pypy".
>>
>> That way, if a user wants access to CPython extension modules and no
>> additional dependencies outside of CPython, then maybe clojure-py is
>> their best bet.
>>
>> And if a user wants all-out maximum performance, and doesn't mind a
>> more complex system and having to install PyPy, then maybe
>> clojure-pypy is what they're looking for.
>
> What do you mean by "a more complex system"?

I meant the implementation.

My meager understanding of PyPy is that it's substantially larger and
more complex than CPython, but that it runs code very swiftly.

Incidentally, I found [David B's PyCon 2012
keynote](http://pyvideo.org/video/659/keynote-david-beazley) (which is
primarily about PyPy) to be pretty interesting.

Christophe Grand

unread,
May 29, 2012, 3:40:52 AM5/29/12
to clojure-py-dev
Hi,

On 23 mai, 21:09, Timothy Baldridge <tbaldri...@gmail.com> wrote:
> Recently I've been putting allot of thought into the Clojure-Py
> compiler and what my goals are with it. I guess to start with I should
> explain the three viewpoints of what we should be doing with
> Clojure-Py
>
> 1) ClojureScript integration - The devs of ClojureScript have asked
> several times why I wrote a custom compiler instead of adapting
> ClojureScript. The reason I've given is that I want native macros, and
> I don't want to have Java to be able to run Clojure code. However, we
> could at some point get ClojureScript to compile on Clojure-Py then
> we'd have a self hosting compiler.

Off the top of my head, things that hinder Clojurescript compiler on
Clojurescript which don't affect Clojurescript compiler on Clojure-Py
(afaik) and thus would allow native macros:
* dependency on Google Closure
* no metadata on symbols
* non reified vars
* no standard runtime/os api (node.js ?) to access files

> 3) The PyPy devs have asked "why didn't you just write it as a PyPy
> Module". They are right, if we want excellent performance (performance
> that could surpass that of Clojure on the JVM I think) then writing
> Clojure-Py in RPython is the way to go.
>
> It's this last item that interests me the most let me explain (for
> those who don't know how PyPy works) what this would involve.
>
> 1) We'd adapt all of clojure/lang to run in RPython (restricted python).
> 2) We'd dump the compiler. In it's place we'd write a Clojure
> interpreter. So, executing interpreter/eval on a ISeq would execute
> that sequence. Against a vector, it would construct a vector, etc.
> 3) We annotate the source of this interpreter with a few hints for PyPy
> 4) The PyPy interpreter creates a JIT specifically for Clojure code.

Wouldn't it be possible to have the compiler to emit RPython code and
still benefits from pypy?
Could such a compielr be written on top of the cljs one?

Sorry for the naive questions,

Christophe
Reply all
Reply to author
Forward
0 new messages