Hello everyone (from ages back!) - short questions on shedskin and memory sharing

Skip to first unread message

Ian Ozsvald

Nov 16, 2013, 2:05:31 PM11/16/13
to shedskin...@googlegroups.com
Hello Mark and everyone. I've taught shedskin in classes and at PyCons
over the last couple of years. I'm now writing an O'Reilly book on
High Performance Computing and I'm including some shedskin examples
(using 0.9.4).

I have a Julia Set example (rather similar to my previous Mandelbrot
Set examples for teaching), I'm working through cython (with lists,
arrays and nparrays) and shedskin (and later Numba, maybe Parakeet

In the Julia Set I build a pair of lists of complex numbers (for zs
and cs), 1,000,000 points in each to represent a reasonable input set.
I then pass these into an extension module.

Can I check a few points?
1) CPython lists (e.g. lists of complex nbrs) are always *copied* into
a ShedSkin extmod, so the memory usage is doubled and there's a time
penalty for copying, right? The same will happen with a result list
being passed back too I guess
2) I've seen mention that slice notation is a slow way to reference
elements - is this true for single item access (e.g. some_list[0]) or
just for range slicing?
3) I see that the array module is supported - are the underlying bytes
shared between processes using array, or are they copied in too?
4) Did anyone ever look into using OpenMP on parallelisable 'for'
loops? I did years back but never got it to work (a la 'prange' in

We discuss the 'inertia of data' (i.e. copying large numbers of bytes
is expensive) and I'm wondering if ShedSkin has ways of sharing data
with the parent Python process, hence questions 1) and 3)


Ian Ozsvald (A.I. researcher)


Mark Dufour

Nov 17, 2013, 6:36:46 AM11/17/13
to shedskin-discuss
hello ian,

thanks a lot for this. it's wonderful to hear you are still showing shedskin during your classes, and that I have lived to see the day a book with this title is written.. :-)

unfortunately, nothing is shared at this point between a shedskin-compiled extension module and an importing parent process. so all arguments and returned objects are completely converted on each call. it sometimes requires a bit of puzzling so that this doesn't destroy performance. btw, I don't think arrays can be passed in/out an extension module at this point. the c64 example at least uses array.tostring to work around this, which is pretty fast (basically a single memcpy).

I've just measured the overhead of passing a list with 32MB of floats into an extension module which just passes 4MB of ints back, and the result is almost exactly 0.05 seconds :-) there's probably not much GC overhead in your example in any case, since there's probably not much heap allocation going on. IIRC the GC is only triggered on calls to 'new'.

you probably don't want to be using slicing in some inner loop, because this actually creates a new heap object (which the GC has to clean up again). especially if the slices end up being small, the difference with a few item accesses becomes enormous.. item access is quite fast, especially when combined with shedskin -b or even -w. but just -b should be usually safe, and can make a big difference for indexing-heavy code..

I'm not aware of other people trying to use OpenMP or similar systems with shedskin, and I have nothing sensible to say about it. though I'm sure you know that it is possible to combine shedskin with for example the multiprocessing module (but again this may require a bit of puzzling). shedskin/examples/pylot_main.py uses 4 processes this way to raytrace an image.

please let me know if you have more/further questions, and good luck with the book! proceeding to pre-order a few copies.. :-)


You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shedskin-discu...@googlegroups.com.
To post to this group, send email to shedskin...@googlegroups.com.
Visit this group at http://groups.google.com/group/shedskin-discuss.
For more options, visit https://groups.google.com/groups/opt_out.


Ian Ozsvald

Nov 17, 2013, 1:00:00 PM11/17/13
to shedskin...@googlegroups.com
Superb answers, thank you Mark :-) I'm glad to talk about Shed Skin, I
rarely need it in practice (generally I work with numpy so Cython fits
better), but sometimes I throw a problem at Shed Skin and 'it just
works'. An easy life is a good life.

I'm happy to hear that a 34MB float memcpy might explain the 0.05s, I
believe it is the only difference between my Cython+numpy array
example and this Shed Skin+lists example (so Cython is sharing the
nparray data behind the scenes).

The tightest inner loop does no index referencing, so I don't think
that's a problem in my code. I've not tried profiling this yet (that's
for later), so maybe I'll see other surprises (but the timing is just
0.1s slower than the Cython+lists example which won't do any list
copying but will do Python dereferencing, so I think that difference
is explainable too).

Re. multiprocessing - yes, I know about using an extmod, I just
wondered if anyone had tried wrapping an openmp for loop definition
around the generated C (I did that for EuroPython 2011 but didn't get
it to work).

I shall make some additions to the chapter. Many thanks :-)

Reply all
Reply to author
0 new messages