I have just released version 0.2 of Shed Skin, an experimental (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com). It comes with 7 new example programs (for a total of 40 example programs, at over 12,000 lines) and several important improvements/bug fixes. See http://code.google.com/p/shedskin/wiki/ReleaseNotes for the full changelog.
The new example programs consist of Disco, an elegant go player (see http://shed-skin.blogspot.com/2009/07/disco-elegant-python-go-player....), a larger Voronoi implementation at 800 lines, a TSP algorithm simulating ant colonies, a nicer neural network algorithm and three compressors (Lempel-Ziv, huffman block, and arithmetic).
Other than bug fixes for these programs, this release adds some important optimizations. First and foremost, inlining was greatly improved, resulting in potential speedups across the board. Second, loops such as 'for a, b in enumerate/zip(sequence[, sequence])' should now be dramatically faster (also inside list comprehensions), by avoiding allocation of intermediate tuples. Finally, basic list slicing should now be much faster.
Please try it out! Mark Dufour. -- "One of my most productive days was throwing away 1000 lines of code" - Ken Thompson
William Dode wrote: > On 19-07-2009, Mark Dufour wrote: >> I have just released version 0.2 of Shed Skin, an experimental >> (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com).
> I just tested it with a litle game, to find the places of horse on > a board 5x5. The result is :
Note that both Psyco and Cython make a lot less assumptions about Python code than Shed Skin does. Psyco has the advantage of just needing to jump in when it finds out that it can help, so it's the most broadly compatible of the three. But Cython also supports quite a large corpus of dynamic Python code by now. Shed Skin has a lot of restrictions, many of which are by design. It's not intended to compile dynamic code, and I think that's a good thing, because that's what makes it fast for the code that it supports. Getting the same speed in Cython requires a bit more explicit typing, simply because Cython does not assume these restrictions.
I think that all three have their raison d'ętre, and it currently looks like all three are there to stay and to keep growing better. And I'm also happy to read that some optimisations jump from one to the other. ;)
Nice timings, can you please show me the Python, Java and C code versions? I may do more tests. The purpose of all those "example programs" in ShedSkin is to find bugs and to find details to speedup.
I read just enough French to know that "avec" means "with", but I don't understand the difference between "avec malloc *int" and "avec []". Can you explain please?
Thx,
-- Skip Montanaro - s...@pobox.com - http://www.smontanaro.net/ That's more than a dress. That's an Audrey Hepburn movie. -- Jerry Maguire
> I read just enough French to know that "avec" means "with", but I don't > understand the difference between "avec malloc *int" and "avec []". Can you > explain please?
Maybe it's the time difference between using a Python list from Cython and using a C "array" allocated with a malloc from Cython.
> Nice timings, can you please show me the Python, Java and C code > versions? I may do more tests.
also, which shedskin options did you use? did you use -bw, to disable bounds and wrap-around checking? this can make quite a difference for code that does a lot of indexing. if the code uses random numbers, then -r can also make a big difference, to use C rand(), instead of Python compatible random numbers.
and which C++ compiler flags did you use? the default -O2, or something like -O3 -fomit-frame-pointer -msse2..?
Like you'll see, i tried to use exactly the same code for each langage.
> also, which shedskin options did you use? did you use -bw, to disable > bounds and wrap-around checking? this can make quite a difference for > code that does a lot of indexing. if the code uses random numbers, > then -r can also make a big difference, to use C rand(), instead of > Python compatible random numbers.
> and which C++ compiler flags did you use? the default -O2, or > something like -O3 -fomit-frame-pointer -msse2..?
I used the default, shedksin cheval.py; make shedskin 0.2
With -bw and -O3 -fomit-frame-pointer -msse2 i have 5.5s (instead of 8)
On 20-07-2009, Bearophile wrote: > Skip Montanaro: >> I read just enough French to know that "avec" means "with", but I don't >> understand the difference between "avec malloc *int" and "avec []". Can you >> explain please?
> Maybe it's the time difference between using a Python list from Cython > and using a C "array" allocated with a malloc from Cython.
Mark Dufour <mark.duf...@gmail.com> wrote: > I have just released version 0.2 of Shed Skin, an experimental > (restricted) Python-to-C++ compiler (http://shedskin.googlecode.com). > It comes with 7 new example programs (for a total of 40 example > programs, at over 12,000 lines) and several important improvements/bug > fixes. See http://code.google.com/p/shedskin/wiki/ReleaseNotes for the > full changelog.
Cool!
I tried it on a program I wrote to solve a puzzle (Rush Hour). (Actually it solves the meta-puzzle - trying to make the hardest possible Rush Hour puzzle.)
After a bit of twiddling (remove psyco and profiling) I got it to start compiling, but unfortunately it compiled for about an hour (in iterative type analysis..) used up 600 MB of RAM printing an '*' every 10 minutes or so then gave an error message and gave up.
Unfortunately I shut the window by accident, but the error message was something about not being able to resolve types I think with a list of 20 or so unresolved types.
Can you give a hint as to how I debug this? I presume my program has some instances of non static types which is causing the problem, but it is going to be a very long debugging session if it takes me an hour each cycle ;-)
The program is about 700 lines of python (excluding comments).
> Can you give a hint as to how I debug this? I presume my program has > some instances of non static types which is causing the problem, but > it is going to be a very long debugging session if it takes me an hour > each cycle ;-)
> The program is about 700 lines of python (excluding comments).
You can show us the Python (SSPython) code, and we can try to find the problem. Sometimes there's no simple ways to solve such problems.
Generally for not very large progrograms if SS doesn't compile in about a minute or so then it's gone in infinite loop (there's a compilation flag that avoids some infinite loops, try it).
Timings (no printing), seconds, best of 3: #1: 4.79 #2: 3.67 #3: 1.10 Your Psyco version: 13.37 Your C version, compiled with GCC 4.3.2, -s -O3 -fomit-frame-pointer: 3.79
I have timed the whole running time of the programs, with an external timer, so my timings include the start of the PythonVM and the final cleanup of the GC.
Please, feel free to time my code again, so we can compare with your other timings (Java, etc) better.
I have used Psyco 1.6 final and Python 2.6.2 on a Core2 CPU at 2 GHz.
In such benchmarks it's often better to start from the fastest low level code (written in an imperative language as C/D/C++, or a version in a functional language like Clean or OCaML) and then use it to create higher level versions (in Python, etc). This offers you a baseline timing, and usually the small differences of the higher level code compared to the original C/Clean code don't invalidate the test.
I have changed only small things in the program, as you can see the Psyco program #2 is faster than your C code :-) If you learn how to use it well, Psyco1.6 can often lead to very good performance. But lot of people don't know how to use Psyco well (I too am ignorant: I don't know how to profile Python programs yet).
> I tried it on a program I wrote to solve a puzzle (Rush Hour). > (Actually it solves the meta-puzzle - trying to make the hardest > possible Rush Hour puzzle.)
> After a bit of twiddling (remove psyco and profiling) I got it to > start compiling, but unfortunately it compiled for about an hour (in > iterative type analysis..) used up 600 MB of RAM printing an '*' every > 10 minutes or so then gave an error message and gave up.
> Unfortunately I shut the window by accident, but the error message was > something about not being able to resolve types I think with a list of > 20 or so unresolved types.
> Can you give a hint as to how I debug this? I presume my program has > some instances of non static types which is causing the problem, but > it is going to be a very long debugging session if it takes me an hour > each cycle ;-)
> The program is about 700 lines of python (excluding comments).
Split it into pieces and compile each separately. Recurse.
William Dode wrote: > I just tested it with a litle game, to find the places of horse on > a board 5x5. The result is :
> cython avec malloc *int 18s
Posting benchmark times for Pyrex or Cython is pretty meaningless without showing the exact code that was used, since times can vary enormously depending on how much you C-ify things.
> Posting benchmark times for Pyrex or Cython is pretty > meaningless without showing the exact code that was > used, since times can vary enormously depending on > how much you C-ify things.
please send any program that doesn't work with shedskin (it still is experimental after all) to me, or create an issue at shedskin.googlecode.com, and I will have a look at the problem.
> please send any program that doesn't work with shedskin (it still is > experimental after all) to me, or create an issue at > shedskin.googlecode.com, and I will have a look at the problem.
On Wed, Jul 22, 2009 at 2:48 AM, Bearophile<bearophileH...@lycos.com> wrote: > greg: >> Posting benchmark times for Pyrex or Cython is pretty >> meaningless without showing the exact code that was >> used, since times can vary enormously depending on >> how much you C-ify things.
It was just a matter of cdef-ing all the arrays & integers -- bearophile already did the hard work :-)
Here's the cython code; all the others are from the repo.
################################################# DEF NMOVES = 8 DEF SIDE = 5 DEF SQR_SIDE = SIDE * SIDE
cdef int circuit[SQR_SIDE] cdef int nsolutions = 0
cdef int movex[NMOVES] cdef int movey[NMOVES] py_movex = [-1,-2,-2,-1,+1,+2,+2,+1] py_movey = [-2,-1,+1,+2,+2,+1,-1,-2] for i in range(NMOVES): movex[i] = py_movex[i] movey[i] = py_movey[i] shift = [x * SIDE + y for x,y in zip(py_movex, py_movey)] cdef int shift_0 = shift[0] cdef int shift_1 = shift[1] cdef int shift_2 = shift[2] cdef int shift_3 = shift[3] cdef int shift_4 = shift[4] cdef int shift_5 = shift[5] cdef int shift_6 = shift[6] cdef int shift_7 = shift[7]
def showCircuit(): print for x in xrange(SIDE): x_SIDE = x * SIDE for y in xrange(SIDE): if SQR_SIDE < 100: print "%02d " % circuit[x_SIDE + y], else: print "%03d " % circuit[x_SIDE + y], print
cdef void solve(int nb, int x, int y, int SIDE=SIDE, int SQR_SIDE=SQR_SIDE, int *circuit=circuit, int shift_0=shift_0, int shift_1=shift_1, int shift_2=shift_2, int shift_3=shift_3, int shift_4=shift_4, int shift_5=shift_5, int shift_6=shift_6, int shift_7=shift_7, ): global nsolutions
cdef int newx, newy cdef int pos = x * SIDE + y circuit[pos] = nb if nb == SQR_SIDE: #showCircuit() nsolutions += 1 circuit[pos] = 0 return
newx = x + -1 if newx >= 0 and newx < SIDE: newy = y + -2 if newy >= 0 and newy < SIDE and not circuit[pos + shift_0]: solve(nb+1, newx, newy)
newx = x + -2 if newx >= 0 and newx < SIDE: newy = y + -1 if newy >= 0 and newy < SIDE and not circuit[pos + shift_1]: solve(nb+1, newx, newy)
newx = x + -2 if newx >= 0 and newx < SIDE: newy = y + 1 if newy >= 0 and newy < SIDE and not circuit[pos + shift_2]: solve(nb+1, newx, newy)
newx = x + -1 if newx >= 0 and newx < SIDE: newy = y + 2 if newy >= 0 and newy < SIDE and not circuit[pos + shift_3]: solve(nb+1, newx, newy)
newx = x + 1 if newx >= 0 and newx < SIDE: newy = y + 2 if newy >= 0 and newy < SIDE and not circuit[pos + shift_4]: solve(nb+1, newx, newy)
newx = x + 2 if newx >= 0 and newx < SIDE: newy = y + 1 if newy >= 0 and newy < SIDE and not circuit[pos + shift_5]: solve(nb+1, newx, newy)
newx = x + 2 if newx >= 0 and newx < SIDE: newy = y + -1 if newy >= 0 and newy < SIDE and not circuit[pos + shift_6]: solve(nb+1, newx, newy)
newx = x + 1 if newx >= 0 and newx < SIDE: newy = y + -2 if newy >= 0 and newy < SIDE and not circuit[pos + shift_7]: solve(nb+1, newx, newy)
circuit[pos] = 0
def main(): print "Search for side=%d" % SIDE cdef int x,y for x in range(SIDE): for y in range(SIDE): solve(1, x, y); print "\n%dx%d case, %d solutions." % (SIDE, SIDE, nsolutions)
def run(): import time s=time.time() main() print time.time()-s #################################################