1) writing a Clojure interpreter in RPython, and then having PyPy
generate a JIT for you. This would, by far, create the fastest
solution. PyPy loves creating JITs for languages with immutable data
and pure functions. In fact, the JIT goes bonkers with optimization
when it comes to programs that follow this criteria. So going this
route would allow you to tell the JIT generator that all your
functions are pure, and all your data is immutable. The bad thing is,
you have no library...you have to write your entire runtime library
yourself.
2) writing a Clojure -> Python translator, then run the resulting
code in PyPy (like Clojurescript does with JS). Here the JIT will be a
bit more "unhappy" since you could change the classes at any moment.
Now the fact that you don't change it means that the performance
impact will be lessened somewhat, but still the impact will be there.
Now both solutions will take two things from you 1) the extensive
library of the JVM. The more I work with the JVM (I'm a .NET guy), the
more I see the value of the platform. 2) you loose co-currency. Most
PyPy code is not thread safe, so there's that.
But I have to disagree with Bret on his comments about how advanced
the JVM is. True the Sun JVM is advanced, but you have to understand
how truely complex the JVM bytecode is. On top of this, no JVM I know
actually implements a tracing JIT. This is where PyPy excels. PyPy
profiles the code while it is running and does some truely insane
optimizations. For instance, PyPy will rip apart data structs. So if
Foo.x is the only member used from the Foo struct in function Bar,
PyPy will re-write a version of Bar on-the-fly so that Bar takes a
single int instead of a full struct. PyPy then also removes unneeded
allocations, etc. Basically it unboxes primitives and generates code
using those primitives while the program is running.
On top of all that, the tracing JIT of PyPy will string functions
together, finding the loops in the actual code, then JIT native code
to represent these loops.
The net effect of this is, that many functions (regex engines, string
functions, and yes even video processing) run just as fast in PyPy as
in pure C code. And in some cases, PyPy can generate code faster than
hand-written C code.
So all that to say, yes, I think there's a lot of potential in PyPy,
but translating Clojure to it is no small task. And even when you're
done, you're still in the same boat as Clojure-CLR and
ClojureScript...no matter how good you are, you're still not "real"
Clojure.
Timothy
> ...The main point the poster should take away is the lack of any library/runtime tools, you have to build from the ground up many things you take for granted when targeting the JVM.
I've thought about Clojure on PyPy too. My thought was that you would make Clojure an RPython addition to the RPython Python. That would give you a pretty good start at a library, at least.
When I glanced at it, PyPy's build tools didn't seem to be sophisticated enough to handle this as some kind of add-in to the Python implementation: you would have had to fork PyPy to get this to work, or spend some time on the build tools. Maybe I missed something.
I also felt that sticking with the official Java implementation of Clojure would be more practical. It would certainly be fun to put Clojure on PyPy, though.
Gary
True...true, it supports it, but it's still a "2nd class citizen". For
instance, we don't have lein, ring, parts of contrib, IDEs, etc. All
the examples, all the books are all about Clojure on the JVM. Do
anything else and you can forget using Clojars, and 90% of the
ecosystem built around Clojure.
The other issue here, is that translating Clojure to CLR is a fairly
straight forward task. C# and Java support most of the same language
features. No so in Python. For instance, Python does not have
anonymous classes, anonymous multi line lambdas (e.g. #(do (expr1)
(expr2)) ), or overloaded constructors. So now, the entire structure
of the code has to change.
Now, what should be done at some point (IMO) is to implement Clojure
in Clojure. This is why PyPy is so successful. They have written a
Python interpreter in Python and then written a translator that can
take a Python program and write a JIT for it. That JIT generator can
then output many different code types. For instance, backends for C,
LLVM, .NET and JVM all exist for PyPy...IMO this is what Clojure would
need to be completely portable.
ClojureScript comes close, all of ClojureScript is written in Clojure.
However the actual code generation of ClojureScript is complected into
the compiler. So there's a function in the compiler called "emit-call"
but that function assumes you will always use it to emit js code. If
we were able to abstract this compiler a bit more, we could write
backends for Java/.NET/JS/PyPy/PHP(yuck) or whatever.
Finally Clojure does make a few assumptions about the underlying VM.
Clojure assumes that the entire system will run on a VM that only
supports OOP. Hence every single function is a object. Now look at
PyPy or JS, here functions are true values in the VM. So the best way
to implement Clojure in PyPy would be to take advantage of these
facilities. But then you're really re-defining what clojure is and how
it runs. PyPy has a very advanced JIT that works well with dynamic
typing, but if you implement stock Clojure on PyPy you'd be ignoring
all that. Look at RT.java, you'll see tons of "if (obj is ISeq)... if
(obj is Array)...". Much of that could be thrown away in a PyPy
implementation, but if you threw that away, now you have a headache
when the Clojure devs start updating the code, and you have to port
those changes to your code base.
Sorry for the rambling...
Timothy
There is one insanely off-the-wall idea I've been thinking about
recently, however:
1) Implement a full JVM in PyPy using GNU Classpath. Write a Java
bytecode interpreter in PyPy. This way you get the power of PyPy
(tracing JIT) with the power of Java (classpath)
2) Run stock JVM Clojure on this interpreter
3) ....
4) Profit!
Timothy
--
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)
>> I also felt that sticking with the official Java implementation of Clojure would be more practical. It would certainly be fun to put Clojure on PyPy, though.
>
> There is one insanely off-the-wall idea I've been thinking about
> recently, however:
>
> 1) Implement a full JVM in PyPy using GNU Classpath. Write a Java
> bytecode interpreter in PyPy. This way you get the power of PyPy
> (tracing JIT) with the power of Java (classpath)
> 2) Run stock JVM Clojure on this interpreter
> 3) ....
> 4) Profit!
LOL, that is awesomely insane! It would certainly be fascinating to see how it turned out in terms of comparative performance.
Gary
Very true. But what it does do is abstract away the core elements of
Clojure. ISeq, IPersistentList, PersistentList, etc. are all cross
platform. All you need is a "emit-class" function that understands how
to generate private members, methods, and handle inheritance. All of
the concurrency routines can be handled via classes and
CompareAndSwap. What I would love, is to be able to tell the Clojure
compiler "here is how you generate classes, here is how you call
CompareAndSwap....now go and build me classes found in Ref.java,
LockingTransaction.java, Atom.java, etc."
This is pretty much the way GNU Classpath works. You have to define
the lowest level structs (VMInt, VMObject, VMFile, VMSocket, etc.) and
from there every single part of the class library is simply Java
abstractions of those base classes.
Timothy
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
True...true, it supports it, but it's still a "2nd class citizen". For
instance, we don't have lein, ring, parts of contrib, IDEs, etc. All
the examples, all the books are all about Clojure on the JVM. Do
anything else and you can forget using Clojars, and 90% of the
ecosystem built around Clojure.
Finally Clojure does make a few assumptions about the underlying VM.
Clojure assumes that the entire system will run on a VM that only
supports OOP. Hence every single function is a object. Now look at
PyPy or JS, here functions are true values in the VM. So the best way
to implement Clojure in PyPy would be to take advantage of these
facilities. But then you're really re-defining what clojure is and how
it runs. PyPy has a very advanced JIT that works well with dynamic
typing, but if you implement stock Clojure on PyPy you'd be ignoring
all that. Look at RT.java, you'll see tons of "if (obj is ISeq)... if
(obj is Array)...". Much of that could be thrown away in a PyPy
implementation, but if you threw that away, now you have a headache
when the Clojure devs start updating the code, and you have to port
those changes to your code base.
Rpython is very restrictive. Basically it's garbage collected C++ with
a different syntax. So this means we can't import modules at runtime.
So any additional libraries must be added via C FFI. So as an example,
let's take a look at core/slurp. To properly implement this function,
we need HTTP support...but where do we get this from? On the JVM and
CLR this is simple. But in PyPy we need to go and find a HTTP library,
find a way to link it in, figure out how to call it's methods via FFI,
and then figure out how to dispose of any memory it creates. So what
used to be a simple 5 lines of code calling HttpWebRequest (on the
CLR), has now ballooned into a lib requirement, FFI, and a bunch of
support routines. Or we can simply say "slurp can only read from
files", which means now you just create a doc of "ways clojure-pypy
differs from clojure-jvm.
In some ways, this is why I'm in favor of going with a clojure->python
translator. There would be a small impact in speed, but this means you
could leverage the existing Python ecosystem. And to be honest,
debugging Python is way better than RPython, for example:
def foo (x, y):
return x + y;
x = 0
print foo(x, 1.0)
print foo(x, 1)
Will the above compile? In Python, yes, in RPython, no....and here is
the error you will receive:
"Resolution of foo has revolved to SomeObject. Previous definitions
are foo(int, float), at foo(int int)"
Now if you work with RPython long enough...that may make sense...but
most likely you won't quite understand that this is complaining
because you've already created a function as having a float argument,
and later you use it with a int argument.
So to sum this up. With RPython you loose 100% of the ecosystem, and
concurrency. With Python you still loose concurrency, but you have the
ecosystem, but you do loose some performance.
-----
Now let me say this: I'm in favor of Clojure on PyPy, it'd take a lot
more work that it looks originally, but it's not impossible. I'm for
this project, and I'd love to help. Sadly, I've got too many other
things on my plate right now to head this up. But if someone else
creates the project, and gets it going, I'll make regular commits and
help where/whenever I can.
Also, I'm only half joking about implementing a JVM on pypy...someone
recently implemented the JVM on Javascript, so it's not that hard.
Timothy
The problem Timothy mentions are very real but I (at least at first)
dont really care about IO problems. Clojure is diffrent on every
platform anyways, if Clojure-pypy would not support everything clojure
does. I would go at it by first writting a interpreter for a language
that supports everything clojure-script does. Then in a second step
make the programm conformant with the RPython restrictions. In a third
step then maybe creating the IO facilitis.
I just don't like the workflow of compiling everything. If I want to
use clojure on the comandline i want to write "clojure-pypy
mytool.cljp".
Does this seam resenable?
I was thinking about starting this rather soon. For me it would be a
research type project (at least at first). Would anybody care to work
with me on this?
Does this seam resenable?
I was thinking about starting this rather soon. For me it would be a
research type project (at least at first). Would anybody care to work
with me on this?
>
I would love to help out on this project. A few things to think about:
1) PyPy takes a very, very loose view of "bytecode". That is, there is
no reason to execute "bytecode" inside your interpreter. Instead...in
LISP data=code. So why not just allow any arbitrary object to be
eval()? Basically this means you can write a pure LISP
interpreter...and PyPy will write the JIT.
2) It should be possible to implement only a very, very small subset
of Clojure in RPython, then write the rest in Clojure. ClojureScript
does a very, very good job at this. The idea is that you define
deftype, defprotocol, if, def, and a few other functions, then
implement 100% of the rest of the code via these functions. Using
arrays, it's possible to implement PersistentHashMaps, Vectors, etc.
And then the nice thing is, you can completely modify the underlying
interpreter at will, and not have to modify any code. I guess what I'm
suggesting, is that you abstract Clojure from the VM that it operates
on. As long as the compiler knows how to deftype, defprotocol, etc.,
it can run your Clojure subset.
I guess the above two points are mostly to keep you from having to
re-implement code in the future, or have to spend a ton of time on
useless code. I've built a bytecode system in PyPy and it's no fun. As
it is, you'll have fun with just basic things like figuring out how to
add a float and a int.
Timothy
Rpython is very restrictive. Basically it's garbage collected C++ with
a different syntax. So this means we can't import modules at runtime.
So any additional libraries must be added via C FFI. So as an example,
let's take a look at core/slurp. To properly implement this function,
we need HTTP support...but where do we get this from? On the JVM and
CLR this is simple. But in PyPy we need to go and find a HTTP library,
find a way to link it in, figure out how to call it's methods via FFI,
and then figure out how to dispose of any memory it creates. So what
used to be a simple 5 lines of code calling HttpWebRequest (on the
CLR), has now ballooned into a lib requirement, FFI, and a bunch of
support routines. Or we can simply say "slurp can only read from
files", which means now you just create a doc of "ways clojure-pypy
differs from clojure-jvm.
Also, I'm only half joking about implementing a JVM on pypy...someone
recently implemented the JVM on Javascript, so it's not that hard.
I slapped it up on github...it's ugly, but it's a start:
https://github.com/halgari/clj-pypy
The idea here is simple. Every object in the VM must implement
evaluate(self). Functions must then implement invoke(self, args). Most
objects will simply return themselves when evaluate is called, there
are some exceptions:
Var objects return the result of evaluating their most recent binding
Symbols find a var that corresponds to their name, and runs evaluate() on that
Lists have the following code in evaluate():
def evaluate(self):
f = self.first().evaluate()
print "f = ", f
args = []
h = self.rest()
while h is not None:
args.append(h.first().evaluate())
h = h.rest()
return f.invoke(args)
Basically lists evaluate all their arguments, then construct a arglist
and pass the arguments to the first item in the list. The awesome
thing about this is that this all would be quite expensive in most
VMs, but the majority of this will be optimized away by the pypy JIT.
I have to admit, I'm very excited about this project. PyPy's JIT goes
bug nuts when it can work with immutable structures, I think we'll
start to realize over time that the tracing JIT is a excellent fit for
a LISP like language. And the fact that (at least in my vm) we don't
have bytecode, this means that we can truly have code be the same as
data.
BTW...the code on github probably won't compile with PyPy yet. I just
ran it in Python 2.7. I want to get the code in scratchspace.clj
working first then I'll get it to run in RPython.
Timothy, you may want to have a look at my "Scheme in Python"
interpreter. There might be some overlaps. The difference is of course
that Scheme requires the interpreter to by fully tail recursive so
you'll see trampolining and continuations all over the code.
One design decision I made was structuring the whole interpreter as a
set of stream processing routines. That is the reader takes a stream of
characters and produces a stream of tokens, which goes to the parser (1)
producing a stream of s-expressions, which goes to evaluator...
(1) here should come the macro expander but I haven't finished it yet.
The code is on github:
https://github.com/andrzej-r/PScheme
Cheers,
Andrzej
On 11/22/2011 02:10 PM, Timothy Baldridge wrote:
> So I got thinking about clojure pypy tonight, and got thinking how
> easy it would be to adapt my old code to run as a interpreter. So I
> pulled in a few files, implemented a few methods, and I have prototype
> running (+ 1 2) as interpreted lisp code.
>
> I slapped it up on github...it's ugly, but it's a start:
> https://github.com/halgari/clj-pypyTimothy, you may want to have a look at my "Scheme in Python"
interpreter. There might be some overlaps. The difference is of course
that Scheme requires the interpreter to by fully tail recursive so
you'll see trampolining and continuations all over the code.
> I also felt that sticking with the official Java implementation of Clojure would be more practical. It would certainly be fun to put Clojure on PyPy, though.There is one insanely off-the-wall idea I've been thinking about
recently, however:1) Implement a full JVM in PyPy using GNU Classpath. Write a Java
bytecode interpreter in PyPy. This way you get the power of PyPy
(tracing JIT) with the power of Java (classpath)
Seeing as VMkit is a method level jit, and PyPy creates tracing JITs,
basing a JVM off of VMKit to run clojure on it kindof defeats the
whole purpose.
http://tratt.net/laurie/tech_articles/articles/fast_enough_vms_in_fast_enough_time
Pretty cool stuff.
Will clojure-py allow us to write our own VM's using clojure?
TL/DR: yes
Long version: RPython simply is a restriction on what bytecodes can do
in a given set of branches in a python program. So the cool thing
about RPython is, you define a main(argv[]) function, and then point
the PyPy translator at that function. Any code touched by that
function must conform to the RPython restrictions. But any code used
to generate that function can use standard Python.
So writing a RPython program is very much possible in clojure-py.
However the restrictions of RPython start to manifest themselves a bit
more in clojure-py. For instance most of clojure.core will be totally
useless to you. RPython states that "any function can take one and
only one type for each argument". This means that the following code
will not compile via RPython.
(defn foo [x] x)
(print (foo 1))
(print (foo "2"))
Now, the way we get around this is by wrapping everything:
(defprotocol W)
(deftype W_int [x]
W
(toString [self] x.__str__))
(deftype W_string [x]
W
(toString [self] x.__str__))
Now we can do what we want:
(defn foo [x] (.toString x))
(print (foo (W_int. 1)))
(print (foo (W_string "2")))
I'm hoping Macros and the like will help us get around allot of these
issues, but still, it's going to take some knowledge of how RPython
works to get clojure-py to work with it.
Timothy
The big question for me is how this relates to macros. This sounds
like a metaprogramming ability, where instead of changing the source
code, you are changing the implementation layer.
Here is a concrete use case I'm interested in: optimizing algorithms.
I have some set of algorithms that needs such-and-such operations to
be as fast as possible. Can I create a VM that is tailored for that?
For example, tailor a VM around core.logic. Or would this be a silly
thing to do?
There is also a third case, intermediate between optimizing vms for
algorithms and optimizing for full blown languages.
Think about implementing erlang-style actors. They send messages, have
the mailbox, and reduce a certain number forms when their turn comes.
Ignoring everything else in the language (the primitives in the actor
body that do the computation associated with message), this skeleton
system requires a certain kind of VM. Can we reify the clojure VM to
enable these properties?
It would be pretty impressive if one could parameterize the VM space,
and access different parts of that space within a contiguous clojure
program.
Yes, this is basically what PyPy does for Regexes, they have a custom
regex engine that has "can_enter_jit" in it. So basically what you get
is a jitted regex engine. The results are astounding:
http://morepypy.blogspot.com/2010/06/jit-for-regular-expression-matching.html
They show a 8x speed improvement over the Java Regex engine. Now for
core.logic...that's a bit of a different story.
The thing about tracing JITs is that they excel at making small tight
loops extremely efficient, so depending on the implementation of
core.logic, that may or may not apply.
>>It would be pretty impressive if one could parameterize the VM space,
and access different parts of that space within a contiguous clojure
program.
So that's a bit hard due to the way PyPy implements types...or
doesn't. PyPy does not define what a type is at all. Instead it leaves
the typing system open to the developer of the interpreter. So a C#
interpreter may define a type's vtable as a list, and use ids to
figure out what method to run, or a Python jit may just use a
dictionary to look up a method on a given type. Because of this, the
JIT really has no way to perform interop between two types. On top of
that, PyPy does not allow loading any modules at runtime. All interop
with outside libs must be through FFI (like ctypes in Python). So
actually the standard library in pypy is 100% pure python code that
makes FFI calls to C Libraries. In CPython, there is a mixture of C
and Python code.
Timothy