I'd like to have a thread with links to blog postings and such where
shedskin is tested or compared with other implementations. to start
with, here's a very recent blog post, where the author tries some
different approaches to get his code to run faster:
shedskin goes from slowest (C++ doesn't like huge amounts of small
allocations!) to fastest (after manual C), after a few minor tweaks.
and it can be faster with -bw (see comments).
here's a recent comparison of mine against psyco. looks like next time
I will have to compare against pypy.. :)
Here is a very complimentary blog article. It does have one hard timing comparison, but it feels more like a general introduction to Shed Skin for those who don't know about Shed Skin yet than an actual benchmarking post. (Which is still good, right? I think this article would have helped support the Wikipedia effort, for example; and could be used for a future attempt to get into Wikipedia.)
I am surprised you didn't list this among your links already, because the blog author mentions that he was contacted directly by Shed Skin's author. ;)
On Sun, Nov 28, 2010 at 4:09 PM, John Yeung <gallium.arsen...@gmail.com>wrote:
> Here is a very complimentary blog article. It does have one hard > timing comparison, but it feels more like a general introduction to > Shed Skin for those who don't know about Shed Skin yet than an actual > benchmarking post. (Which is still good, right? I think this article > would have helped support the Wikipedia effort, for example; and could > be used for a future attempt to get into Wikipedia.)
ah, yes, I forgot about this one, thanks! and good you mention wikipedia. if anyone reading this might want to re-create a wikipedia page, here's a link to the section in lutz's well known "learning python" book that describes shedskin:
> shedskin goes from slowest (C++ doesn't like huge amounts of small > allocations!) to fastest (after manual C), after a few minor tweaks. > and it can be faster with -bw (see comments).
I'm very pleased a few tweaks were enough to make the shedskin version faster than the others (especially the pypy one :p). I was very suprised by the first benchmark's results.
I don't know what kind of allocation is responsible for the overhead there, but I guess escape analysis would help for scalar types… and maybe some kind of factory would help for more complex objects too (if this objects are properly garbage collected).
> here's a recent comparison of mine against psyco. looks like next time > I will have to compare against pypy.. :)
I missed that one, so thanks for the link. It makes me think that it would be interesting to have some benchmarks with clang / dragonegg instead of gcc… especially when comparing to pypy.
> I'm very pleased a few tweaks were enough to make the shedskin version > faster than the others (especially the pypy one :p). I was very > suprised by the first benchmark's results.
in the meantime, shedskin GIT has gone from 77 to about 46 seconds for the original program. it turns out string comparison and slicing were both still not very optimal.. :P
the strangeness of the benchmarks comes from the fact it's essentially a single loop:
for x in range(200000000): s = somestr[x:x+3] if s == 'abc' or s == 'bcd' or s == 'cde': blah
so the crucial aspect is how does the memory subsystem handle all these slices. it looks like pypy uses escape analysis to recycle the slices, or perhaps it caches 3-length strings (shedskin only caches 256 1-length strings at the moment). in any case, when I manually 'reuse' the same string object, I get about the same speed as pypy.
I don't know what kind of allocation is responsible for the overhead
> there, but I guess escape analysis would help for scalar types… and > maybe some kind of factory would help for more complex objects too (if > this objects are properly garbage collected).
yes, I think some scheme to allocate objects N at a time and recycling them somehow can help a lot. but I think in this and other cases, it could also help to rewrite the string class. dropping (indirection to) the STL helped a lot for the set and dict classes as well. unused mutability of std::string probably also costs us.
I missed that one, so thanks for the link.
> It makes me think that it would be interesting to have some benchmarks > with clang / dragonegg instead of gcc… especially when comparing to > pypy.
yes, definitely. I think there's quite a bit that can be optimized outside of shedskin. it feels a bit weird to have to worry about escape analysis and other memory optimizations, if Java has them out-of-the-box.. I would expect the Boehm GC to get in the way though, since it overloads 'new', so that it's impossible to follow what happens there. I also doubt they do much in this area, because the naive output of shedskin is not that typical (looks more like Java.. hm :-)).
> the strangeness of the benchmarks comes from the fact it's essentially a > single loop:
> for x in range(200000000): > s = somestr[x:x+3] > if s == 'abc' or s == 'bcd' or s == 'cde': > blah
> so the crucial aspect is how does the memory subsystem handle all these > slices. it looks like pypy uses escape analysis to recycle the slices, or > perhaps it caches 3-length strings (shedskin only caches 256 1-length > strings at the moment). in any case, when I manually 'reuse' the same string > object, I get about the same speed as pypy.
Since Python strings are immutable, slices should never lead to memory allocations, am I wrong? This leads me to the next point :
>> I don't know what kind of allocation is responsible for the overhead >> there, but I guess escape analysis would help for scalar types… and >> maybe some kind of factory would help for more complex objects too (if >> this objects are properly garbage collected).
> yes, I think some scheme to allocate objects N at a time and recycling them > somehow can help a lot. but I think in this and other cases, it could also > help to rewrite the string class. dropping (indirection to) the STL helped a > lot for the set and dict classes as well. unused mutability of std::string > probably also costs us.
Yes, I entirely agree. That's something I've already thought about several times… std::string is of little help there, and it has a serious overhead for several simple tasks : - slicing - copying - resizing to a shorter string - (maybe with some additional work) working with simple `char' where strings are not needed (eg. for expressions like x = 'a' or if foo[42] == 'z': or even x = 'a' if foo[42] == x: )
When the ticket for issue 74 (bz2) was opened, I wanted to write a quick and dirty implementation. Unfortunately, the current internal representation of strings prevents us from simply returning pointers to arbitrary locations in the memory. Same problem for the `easy' task `improve IO speed': being able to mmap files and return pointers on the memory without any copying would help a lot.
As you may already know, the CPython representation for strings is a pair of pointers: one for the beginning of the string, the other for the end. I think nothing is more efficient than this.
Since Python strings are immutable, slices should never lead to memory
> allocations, am I wrong?
well, if we keep everything copy-by-reference, we'd still have to store the slice start and end somewhere.. but if we use copy-by-value, that would avoid any allocations. though I never really liked copy-by-value. it's ugly and complicates code generation in several ways..
Yes, I entirely agree. That's something I've already thought about
> several times… std::string is of little help there, and it has a > serious overhead for several simple tasks : > - slicing > - copying > - resizing to a shorter string > - (maybe with some additional work) working with simple `char' where > strings are not needed (eg. for expressions like
for set and dict it turned out to be a very good idea to just do it the cpython way.. I also like this approach because it gives the same behaviour, so for example making it faster in cpython makes it faster with shedskin as well.
> x = 'a' > or > if foo[42] == 'z': > or even > x = 'a' > if foo[42] == x: > )
shedskin currently caches all 256 possible 1-length strings, so indexing a string for example doesn't cause an allocation, and working with "characters" is really fast. that's also how I optimized geet's program for shedskin, by simply comparing three characters instead of slicing and then comparing.
I played a bit with an alternative 'char' type some time ago, but that ended up complicating too many things, and because the current approach works practically just as well in many cases, in the end that clearly wasn't worth the effort.
> When the ticket for issue 74 (bz2) was opened, I wanted to write a > quick and dirty implementation. Unfortunately, the current internal > representation of strings prevents us from simply returning pointers > to arbitrary locations in the memory. Same problem for the `easy' task > `improve IO speed': being able to mmap files and return pointers on > the memory without any copying would help a lot.
yeah, that's something to keep in mind. though the main problem with IO speed is that currently everything happens per-character.
> As you may already know, the CPython representation for strings is a > pair of pointers: one for the beginning of the string, the other for > the end. I think nothing is more efficient than this.
no, but it sounds logical at this point.. ;) I'm guessing though that it still allocates a new object for each slice, to contain these pointers.. what I'm hoping is that we can somehow efficiently manage such objects, without needing copy-by-value.. it should at least be possible to allocate many of them at once. deallocation and recycling may be a bit trickier.. :P
> -- > You received this message because you are subscribed to the Google Groups > "shedskin-discuss" group. > To post to this group, send email to shedskin-discuss@googlegroups.com. > To unsubscribe from this group, send email to > shedskin-discuss+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/shedskin-discuss?hl=en.