So I kind of wanted to ask this question on the pypy mailing list.. but there's only a pypy-dev list, and I don't want to put noise on the dev list.
What's the point of RPython? By this, I don't mean "What is RPython"? I get that. I mean, why?
The goals of the pypy project seems to be to create a fast python implementation. I may be wrong about this, as the goals seem a little amorphous if you look at their home page.
So, to do that this RPython compiler was created? Except that it doesn't compile python, it compiles a static subset of python that has type inference like ML.
This RPython thing seems like a totally unnecessary intermediate step. Instead of writing an implementation of one language, there's an implementation of two languages.
Actually, it seems like it harms the ultimate goal. For the interpreted pyton to be fast, the interpreter has to be written in a reasonably fast language. ML, and C++ compilers have had a lot of work put into their optimization steps. A lot of the performance boost you get from a static language is that knowing the types at compile time lets you inline code like crazy, unroll loops, and even execute code at compile time.
RPython is a statically typed language because I guess the developers associate static languages with speed? Except that they use it to generate C code, which throws away the type information they need to get the speed increase. Huh? I thought the goal was to write a fast dynamic language, not a slow static one?
Is this going anywhere or is this just architecture astronautics?
The RPython project seems kind of interseting to me and I'd like to see more python implementations, but looking at the project I can't help but think that they haven't really explained *why* they are doing the things they are doing.
Anyway, I can tell this is the sort of question that some people will interpret as rude. Asking hard questions is never polite, but it is always necessary :)
> So I kind of wanted to ask this question on the pypy mailing list.. > but there's only a pypy-dev list, and I don't want to put noise on the > dev list.
> What's the point of RPython? By this, I don't mean "What is RPython"? > I get that. I mean, why?
> The goals of the pypy project seems to be to create a fast python > implementation. I may be wrong about this, as the goals seem a little > amorphous if you look at their home page.
> So, to do that this RPython compiler was created? Except that it > doesn't compile python, it compiles a static subset of python that has > type inference like ML.
> This RPython thing seems like a totally unnecessary intermediate step. > Instead of writing an implementation of one language, there's an > implementation of two languages.
> Actually, it seems like it harms the ultimate goal. For the > interpreted pyton to be fast, the interpreter has to be written in a > reasonably fast language. ML, and C++ compilers have had a lot of work > put into their optimization steps. A lot of the performance boost you > get from a static language is that knowing the types at compile time > lets you inline code like crazy, unroll loops, and even execute code > at compile time.
> RPython is a statically typed language because I guess the developers > associate static languages with speed? Except that they use it to > generate C code, which throws away the type information they need to > get the speed increase. Huh? I thought the goal was to write a fast > dynamic language, not a slow static one?
> Is this going anywhere or is this just architecture astronautics?
> The RPython project seems kind of interseting to me and I'd like to > see more python implementations, but looking at the project I can't > help but think that they haven't really explained *why* they are doing > the things they are doing.
> Anyway, I can tell this is the sort of question that some people will > interpret as rude. Asking hard questions is never polite, but it is > always necessary :)
One of pypy's goals is having a more flexible implementation, easier to experiment with than cpython. Having all written in the same language facilitates this task. Why Rpython? Because by avoiding the dinamicity of the whole python language, it is possible to translate this code to a static one (c, java, whatever). The translated rpython interpreter aims to be something similar in performance to cpython. The only difference is that cpython was written from scratch in c, while pypy's interpreter was written in rpython and then translated automatically to c.
The pypy team intends to achieve greater speed and performance by adding a JIT compiler to this interpreter. So pypy-c + JIT = faster python.
Speed is mentioned, but as a secondary concern. The main goal seems to be to create a vehicle into exploring the concept of dynamic languages themselves. If that seems amorphous then it is because it is a research project.
On Jan 17, 10:37 am, "Brendan Miller" <catph...@catphive.net> wrote:
> What's the point of RPython? By this, I don't mean "What is RPython"? > I get that. I mean, why?
This is more or less covered in the FAQ:
"The restrictions are to ensure that type inference (and so, ultimately, translation to other languages) of RPython programs is possible. These restrictions only apply after the full import happens, so at import time arbitrary Python code can be executed."
> The goals of the pypy project seems to be to create a fast python > implementation. I may be wrong about this, as the goals seem a little > amorphous if you look at their home page.
The home page itself is ambiguous, and does oversell the performance aspect. The *actual* goal as outlined by their official docs is to implement Python in Python, at every level. If this means utilising a less-dynamic subset of Python for the lower levels, its -still- at least more-Python-than-C.
"PyPy is both: * a reimplementation of Python in Python, and * a framework for implementing interpreters and virtual machines for programming languages, especially dynamic languages."
The PyPy devs feel that this will allow them to more easily experiment with optimising the interpreter for greater speeds, but that isn't one of the stated goals (just, apparently, their 'secret' one).
> This RPython thing seems like a totally unnecessary intermediate step. > Instead of writing an implementation of one language, there's an > implementation of two languages.
My understanding is that the 'higher' level Python language features are implemented on top of the restricted RPython features, so it's more of an organic growth of one language than two separate implementations.
> A lot of the performance boost you > get from a static language is that knowing the types at compile time > lets you inline code like crazy, unroll loops, and even execute code > at compile time.
> RPython is a statically typed language because I guess the developers > associate static languages with speed?
Doesn't the first paragraph above state the same thing that you question in the next?
> Except that they use it to > generate C code, which throws away the type information they need to > get the speed increase. Huh? I thought the goal was to write a fast > dynamic language, not a slow static one?
They're not just generating C code, they currently also target LLVM, JVM and CLI.
> Is this going anywhere or is this just architecture astronautics?
Well, they actually seem to have achieved some substantial gains, so I think it's unfair to imply that the project isn't based on pragmatic objectives. Their initial JIT prototype failed to work as well as expected, and they've spent some time re-thinking the approach; I'm happier to wait for a stably performing JIT that can be improved over time than a short term gain.
> The RPython project seems kind of interseting to me and I'd like to > see more python implementations, but looking at the project I can't > help but think that they haven't really explained *why* they are doing > the things they are doing.
A lot of this is covered in the FAQ. Whether you agree with their approach or not, they're the ones actively pushing this effort forward.
It's been a few months since the last update, but the PyPy status blog may have more information for you. At the very least, it's a venue to discuss your concerns directly with the PyPy devs.
>> The goals of the pypy project seems to be to create a fast python >> implementation. I may be wrong about this, as the goals seem a little >> amorphous if you look at their home page.
> The home page itself is ambiguous, and does oversell the performance > aspect. The *actual* goal as outlined by their official docs is to > implement Python in Python, at every level.
Ok fair enough. In some ways I see that as more of a purely intellectual exercise than the practical endeavor that I assumed the project was originally.
However, one of the links I was sent had one of the devs talking about using the translation process to make C/Java/LLVM implementations out of the same interpreter code. I'll say that makes a lot of sense.
Another question I was wondering about is whether they plan on maintaining good C bindings? Will existing bindings for third party libraries be able to work?
Also, are they going to do away with the GIL? The python devs seem to consider the GIL a non-issue, though they may change their mind in 3 years when we all have 32 core desktops, until then getting rid of the GIL would make pypy pretty attractive in some quarters. I know the scons project was running into GIL issues.
Finally, I'm pretty unclear on what versions of python that pypy is targeting.
On Jan 18, 9:55 am, "Brendan Miller" <catph...@catphive.net> wrote:
> > The *actual* goal as outlined by their official docs is to > > implement Python in Python, at every level. > Ok fair enough. In some ways I see that as more of a purely > intellectual exercise than the practical endeavor that I assumed the > project was originally.
Well, it opens up Python as a whole for experimentation & modification by Python developers, which seems pretty practical to me :)
> Another question I was wondering about is whether they plan on > maintaining good C bindings? Will existing bindings for third party > libraries be able to work?
> The python devs seem to > consider the GIL a non-issue, though they may change their mind in 3 > years when we all have 32 core desktops, until then getting rid of the > GIL would make pypy pretty attractive in some quarters. I know the > scons project was running into GIL issues.
It's not a case of the Python devs digging their heels in and not giving the world what it desperately wants. Here's an article by Guido talking about the last attempt to remove the GIL and the performance issues that arose:
"I'd welcome a set of patches into Py3k *only if* the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) *does not decrease*."
alex23 <wuwe...@gmail.com> writes: > Here's an article by Guido talking about the last attempt to remove > the GIL and the performance issues that arose:
> "I'd welcome a set of patches into Py3k *only if* the performance for > a single-threaded program (and for a multi-threaded but I/O-bound > program) *does not decrease*."
The performance decrease is an artifact of CPython's rather primitive storage management (reference counts in every object). This is pervasive and can't really be removed. But a new implementation (e.g. PyPy) can and should have a real garbage collector that doesn't suffer from such effects.
Since this is a PyPy bashing thread, maybe it's an appropriate place to suggest that the project has got a little bit waylaid by exploring cool things instead of releasing a useful final result?
I am not questioning rpython directly - the case for something like that is obvious. But there's a question of balance. It's possible to go on building ever more complex systems which are theoretically justified, but which postpone ever finishing the job. At some point there has to be a "good enough".
To some extent I am playing devil's advocate here, but as an outside who looked at PyPy a while back, my uninformed and naive impression was that the project was suffering from the kid of issues I have caricatured above....
Andrew
PS I guess you are aware of worse is better etc? I think this may also be a US/Euro culture issue...
> Since this is a PyPy bashing thread, maybe it's an appropriate place > to suggest that the project has got a little bit waylaid by exploring > cool things instead of releasing a useful final result?
> I am not questioning rpython directly - the case for something like > that is obvious. But there's a question of balance. It's possible to > go on building ever more complex systems which are theoretically > justified, but which postpone ever finishing the job. At some point > there has to be a "good enough".
> To some extent I am playing devil's advocate here, but as an outside > who looked at PyPy a while back, my uninformed and naive impression > was that the project was suffering from the kid of issues I have > caricatured above....
> Andrew
> PS I guess you are aware of worse is better etc? I think this may > also be a US/Euro culture issue...
It's worth adding that, regarding the goal of having a faster python, we have had for a long time a solid and useful proof-of-concept in psyco. Pypy rolls up on this concept and adds many more useful features, not all of them related to speed but to flexibility. I have no doubt the project will be succesful. The question is how long we will have to wait...
<"http://phr.cx"@nospam.invalid> wrote: > alex23 <wuwe...@gmail.com> writes: >> Here's an article by Guido talking about the last attempt to remove >> the GIL and the performance issues that arose:
>> "I'd welcome a set of patches into Py3k *only if* the performance for >> a single-threaded program (and for a multi-threaded but I/O-bound >> program) *does not decrease*."
> The performance decrease is an artifact of CPython's rather primitive > storage management (reference counts in every object). This is > pervasive and can't really be removed. But a new implementation > (e.g. PyPy) can and should have a real garbage collector that doesn't > suffer from such effects. > -- > http://mail.python.org/mailman/listinfo/python-list
That's interesting, I hadn't heard the reference counting mechanism was related to the GIL. Is it just that you need to lock the reference count before mutating it if there's no GIL? Really, that shouldn't be the case. Reference counting can be done with a lock free algorithm.
Garbage collection is definitely in vogue right now. However, people tend to treat it more like a religion than an algorithm. Garbage collection vs reference counting actually has some trade offs both ways. GC gets you some amortized performance gains, and some space gains because you don't need to hang on to a bunch of counts. However, GC also has the problem of having a very loose memory profile and poor interactive performance during compaction if the heap is large. In some cases this discussion becomes complicated with python because python has both reference counting and GC.
"Brendan Miller" <catph...@catphive.net> writes: > That's interesting, I hadn't heard the reference counting mechanism > was related to the GIL. Is it just that you need to lock the reference > count before mutating it if there's no GIL?
Yes. Someone tried inserting such a lock but it slowed down the single-cpu case unacceptably.
> Really, that shouldn't be > the case. Reference counting can be done with a lock free algorithm.
That's interesting, got a reference? (no pun intended)
I found a paper by Detlefs et al describing a method which is
a) patented b) can potentially lock out some threads from ever running, and c) relies on a hardware instruction (double compare and swap) that's not available on most processors.
Maybe I'm missing something here but a lock free algorithm for reference counting seems pretty trivial. As long as you can atomically increment and decrement an integer without locking you are pretty much done.
For a reference implementation of lock free reference counting on all common platforms check out boosts implementation of shared_ptr (a reference counting smart pointer designed around multithreaded use cases).
>There are well known concurrent and parallel GC techniques that
Hmm... I didn't really mention poor parallelism as a problem of GC. As I see it the two trade offs that have to be made for GC vs alternative techniques. The "embarrassing pause" during compaction which makes it impossible to use for applications like interactive video display that can't halt to compact a several gigabyte heap without causing stutter, and the loose memory profile.
Maybe the document you sent me addresses those, and I'd be interested if it did. It's 150~ pages though so I haven't really had time to read it yet.
As far as parallelism problems with GC go... the only ones I can imagine is that if you had a lot of threads going generating lots of garbage you would need to start to compact more frequently. Since compaction halts all threads, this could potentially cause very frequent compactions? Is that what you were getting at? I'd wondered about that before, but didn't know for a fact whether it came up in real world scenarios.
"Brendan Miller" <catph...@catphive.net> writes: > Maybe I'm missing something here but a lock free algorithm for > reference counting seems pretty trivial. As long as you can atomically > increment and decrement an integer without locking you are pretty much > done.
What cpu's do you know of that can atomically increment and decrement integers without locking?
On Jan 19, 8:00 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> Maybe I'm missing something here but a lock free algorithm for > reference counting seems pretty trivial. As long as you can atomically > increment and decrement an integer without locking you are pretty much > done.
You're missing that most of the platforms that Python supports can't actually do this. Keep in mind that it has to be atomic across all cores.
(For Python to use such a technique, it would have to be doable on every platform Python supports. They aren't going to get rid of the GIL for some platforms and not others.)
> For a reference implementation of lock free reference counting on all > common platforms check out boosts implementation of shared_ptr (a > reference counting smart pointer designed around multithreaded use > cases).
I just looked at the boost documentation, which claims that multiple asynchronous writes to the same shared_ptr results in undefined behavior. That will not suffice for Python reference counting.
"Brendan Miller" <catph...@catphive.net> writes: > As long as you can atomically increment and decrement an integer > without locking you are pretty much done.
Most cpu's can't do that.
> For a reference implementation of lock free reference counting on all > common platforms check out boosts implementation of shared_ptr (a > reference counting smart pointer designed around multithreaded use > cases).
That sounds very mysterious to me--are you sure it is intended for multiprocessing and not just multiple threads on a single processor? Do you have a url for the code?
> I see it the two trade offs that have to be made for GC vs alternative > techniques. The "embarrassing pause" during compaction which makes it > impossible to use for applications like interactive video display that > can't halt to compact a several gigabyte heap without causing stutter, > and the loose memory profile.
The embarassing pause is characteristic of a so-called "stop the world" gc. A gc where any pauses are guaranteed bounded in size (preferably to a small value) is called a "real time" gc. I believe Cheng's algorithm is a real time alg but it's been a while since I read that thesis.
Anyway, nobody programs interactive video in Python, and Python's ref counting scheme is not real time since it may have to do arbitrarily many decrefs when you release a large structure.
> As far as parallelism problems with GC go... the only ones I can > imagine is that if you had a lot of threads going generating lots of > garbage you would need to start to compact more frequently. Since > compaction halts all threads, this could potentially cause very > frequent compactions? Is that what you were getting at?
I'm not sure what you're asking. Parallelism is used for making things go faster, but making things parallel usually complicates them. A parallel realtime gc is sort of the holy grail.
Cheng's thesis is pretty readable as I remember, and discusses most of these issues. There is also a book by Appel about gc, but it's perhaps a little bit dated by now.
On 17 Jan., 01:37, "Brendan Miller" <catph...@catphive.net> wrote:
> Is this going anywhere or is this just architecture astronautics?
> The RPython project seems kind of interseting to me and I'd like to > see more python implementations, but looking at the project I can't > help but think that they haven't really explained *why* they are doing > the things they are doing.
Remember that the original objective of PyPy was to improve the JIT. Psyco is limited by the fact that the whole runtime is implemented in C. The infamous "faster than C" actually refers to work on program specializers on C code i.e. treating C as a language that is JIT compiled on a fine grained level ( block structure - whole function JIT compilation wouldn't obviously yield any advantages ).
So it is not just the application level Python code that shall run through the JIT but also the interpreter level code. So why not equate them and think about interpreter level code as Python as well? This might be the idea. But then the problem of bootstrapping comes up: there is no interpreter level code with the required properties. Hence RPython that can serve as a foundation.
I'm also not sure I like the approach. Rather than designing a whole new runtime for having a fully reflective system I'd model C in Python ( call it CiPy ) and create a bijection between CiPy code and C code. Once this has been established one can study other translations of CiPy or Python into CiPy ( not unlike Pyrex/Cython ) doing systematic refactorings on CiPy code in Python, study properties using runtime reflection, experiment with macro systems etc. All of this is done in PyPy as well but sometimes it seems the team has created more new problems than they solved.
> Anyway, I can tell this is the sort of question that some people will > interpret as rude. Asking hard questions is never polite, but it is > always necessary :)
Carl> I just looked at the boost documentation, which claims that Carl> multiple asynchronous writes to the same shared_ptr results in Carl> undefined behavior. That will not suffice for Python reference Carl> counting.
Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have read what you saw but not recognized it in the C++ punctuation soup. I couldn't find what you referred to. Can you provide a URL?
s...@pobox.com writes: > Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have > read what you saw but not recognized it in the C++ punctuation soup. I > couldn't find what you referred to. Can you provide a URL?
<"http://phr.cx"@nospam.invalid> wrote: > s...@pobox.com writes: >> Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have >> read what you saw but not recognized it in the C++ punctuation soup. I >> couldn't find what you referred to. Can you provide a URL?
I think you are misreading that. It says that multiple assignments to different copies of a share_ptr in different threads are fine. This is the important bit because copies of the same pointer share the same reference count, and assignments and resets will decrement that ref count. They say they don't handle mutations of the *same* pointer in different threads, which is a different issue.
The programmer is responsible for synchronizing access to the pointer, and the pointed to object, but not the ref count. This may be not be obvious if you don't use shared_ptr a lot.
You also mentioned in an earlier post that most processors don't support automic increments... I'm hesitant to dispute you here because this is outside of my field of expertise. However, a quick google search for "x86 atomic increment" comes up with this:
Again, I'm not an assembly guru, but his sounds like exactly what you'd want. It gains exclusive access to system memory in a multi-processor environtment without leaving user space. Thus XADD is an atomic increment/decrement. It would be educational if someone more famliar with x86 than me could speak to the performance merits of this on modern multicore machines.
>I just looked at the boost documentation, which claims that multiple >asynchronous writes to the same shared_ptr results in undefined >behavior. That will not suffice for Python reference counting.
If you read the Boost documentation you'll see that while multiple simulaneous writes to a shared_ptr *instance* isn't supported, multiple simulataneous updates of the reference counter used by the shared_ptr implementaiton is supported. Example #1 in the URL that Paul Rubin gave demonstrates this. The allowed simulanteous assignment of the two shared_ptr instances "p2" and "p3" from the same shared_ptr instance "p", requires that a single reference count be incremented simultenously.
On the other hand, the documentation also notes that the implementation is only lock free on x86, IA-64, and PowerPC systems, though I think it would also be possible on MIPS CPUs.
On Jan 16, 5:37 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> So I kind of wanted to ask this question on the pypy mailing list.. > but there's only a pypy-dev list, and I don't want to put noise on the > dev list.
> What's the point of RPython? By this, I don't mean "What is RPython"? > I get that. I mean, why?
There are some distinct benefits of RPython:
* starts up as real python code, letting you define globals that later get "snapshotted" into static code * can be executed using a real python interpreter, getting much more reliable debugging (but broken rpython code might run fine under python) * provides a clear avenue for extension, by adding more python features
You might argue just having a python syntax is also a benefit, but the semantics are so different as to more than counteract it. Add in the cost of implementing your own compiler... yeah.
On Jan 19, 9:00 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> Maybe I'm missing something here but a lock free algorithm for > reference counting seems pretty trivial. As long as you can atomically > increment and decrement an integer without locking you are pretty much > done.
"lock free" is largely meaningless. What it really means is "we use small hardware locks rather than big software locks, thereby reducing (but not eliminating!) the contention".
Atomic refcounting is easy. If done sparingly to minimize contention it works great. Python uses refcounting massively with heavily contended objects (think about your builtin types, constants, globals, classes, etc.) It does not perform acceptably for python.
The second issue is the objects themselves, like a list which is mutable. If you're using it in a single thread or writing from multiple threads this is a non-trivial constant cost. If your object is not modified after creation and is read from many threads a lock would be a point of contention, preventing you from scaling freely. The dicts used by classes and globals are an import example of this, and a successful implementation needs something non-contending. I assume Jython and IronPython do this.
> Again, I'm not an assembly guru, but his sounds like exactly what > you'd want. It gains exclusive access to system memory in a > multi-processor environtment without leaving user space. Thus XADD is > an atomic increment/decrement. It would be educational if someone more > famliar with x86 than me could speak to the performance merits of this > on modern multicore machines.
Those links describe using the LOCK prefix, which as the name implies, asserts a lock, so it is no longer "lockless reference counting". The LOCK prefix adds about 100 cycles to the instruction.