--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
For Shedskin:First test = SS 7°Second test = SS 8°Thirty test = SS 11°Fourty test = SS 24°Fifth test = SS 24°Sixth test = SS 18°Seventh test = SS 21°Average = 16,14
By the way, if you could implement code transformations for optimizationsas post-processing in C++ codes, it would be wonderful!
note that the third and last tests measure memory, so not very interesting I think, unless there are huge differences. and the fourth, fifth and sixth tests will likely go much better with some minor improvements. my hands are itching, but I'd like to wait and see if anyone else looks into them first..
well, to be honest I've got my hands full with the current approach.. there are lots of options for high level optimizations, and escape analysis is my favorite, but I prefer to keep on improving type inference myself.. anyone is welcome to look into additional analyses, and I'd be happy to assist of course.
btw, combining shedskin with ICC might be an interesting experiment of course. I don't seem to recall anyone doing this.. GCC 4.6 may also perform better than the used GCC 4.3 of course..
1. sys.argv's last argument doesn't get the newline stripped from the
end like it does for CPython.
2. re.compile(regex).search(data) doesn't accept an empty data string,
so the if statement has to be "if line[:-1] and r.search(line[:-1]):"
With those fixed, CPython takes 19 seconds and shedskin takes 11.5
seconds on my (old) desktop. As a ratio, this gives SS a score of 3.5
for patmch:1t. Then with 58 for CPython and 71 for SS, that's a score of
15.4 for patmch:2t.
I was also surprised by the slower speed for 2t, but I'll send the fixed
code to attractivechaos and see what he comes up with.
--fahhem
1. sys.argv's last argument doesn't get the newline stripped from the end like it does for CPython.
2. re.compile(regex).search(data) doesn't accept an empty data string, so the if statement has to be "if line[:-1] and r.search(line[:-1]):"
With those fixed, CPython takes 19 seconds and shedskin takes 11.5 seconds on my (old) desktop. As a ratio, this gives SS a score of 3.5 for patmch:1t. Then with 58 for CPython and 71 for SS, that's a score of 15.4 for patmch:2t.
I was also surprised by the slower speed for 2t, but I'll send the fixed code to attractivechaos and see what he comes up with.
thanks!
1. sys.argv's last argument doesn't get the newline stripped from the end like it does for CPython.
hmm, I don't see this..? :) (tested windows as well)
2. re.compile(regex).search(data) doesn't accept an empty data string, so the if statement has to be "if line[:-1] and r.search(line[:-1]):"
which version of python are you using? this seems to works fine here. the following makes re.search accept empty strings, though I'm not sure if it's the correct fix.
--- a/shedskin/lib/re.cpp
+++ b/shedskin/lib/re.cpp
@@ -455,6 +455,8 @@ __iter<match_object *> *re_object::finditer(str *subj, __ss_
match_object *re_object::__exec(str *subj, __ss_int pos, __ss_int endpos, __ss_
{
+ if(subj->unit.size() == 0) return (match_object *)NULL;
+
With those fixed, CPython takes 19 seconds and shedskin takes 11.5 seconds on my (old) desktop. As a ratio, this gives SS a score of 3.5 for patmch:1t. Then with 58 for CPython and 71 for SS, that's a score of 15.4 for patmch:2t.
not too bad, but I *think* shedskin may still uffer from its slow file IO implementation here, because we're ploughing through quite a big file. same thing for the dict benchmark. perhaps that explains why that one is slower than cpython.
I was also surprised by the slower speed for 2t, but I'll send the fixed code to attractivechaos and see what he comes up with.
thanks! :) would be nice in any case not to have '999' there..
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk
mark.
Sorry,I saw here that the latest version for windows already have the compiler...
I got the codes for dict and I am going to try it now.
thanks,
mark.
2011/6/24 Enzo Erbano <enzo....@gmail.com>:
> I also have tried to compile the codes to use in MSVC, and latter to
> have support for Intel C++. The problem I found is that the code links
> to GCC very specific headers, and I found no replacement for MSVC.
> The headers are:
> #include <gc/gc_allocator.h>
> #include <gc/gc_cpp.h>
These files are those of boehm gc¹ on which Shedskin depends. You
have to add them to your include path whatever compiler you use.
Best regards,
¹ http://www.hpl.hp.com/personal/Hans_Boehm/gc/
--
Jérémie
--
<gcc.png>
<gcc-SS.png>
The gc/* files are boehm gc lib, afaik they should compile under msvc as well ... If not the shed versions ones from the boehm website might work? Bu I've found the gcc route most reliable so far myself...
I dont used the -v option at the first time I tried, but now, I tried the -v option, and I am getting problem because the unordered_map onlyexist on MSVC 2010, not on older versions.
Here, it seems that GC is used all over the app, even where it is not necessary!
In general, the total occurrences of trow exceptions are around ~100 times.
or, replace for a better garbage collector, some time ago I heard that the one used onGoogle Chrome is one of the best available, and it seems to be true as Google Chrome beatsall browsers in benchmarks [take a look on this for proof http://clients.futuremark.com/peacekeeper/index.action].
Hi,
Try this, but I'm afraid replacing GC_malloc/GC_free by malloc/free
would actually slow down programs on the average, since they would
then leak memory, and have to swap to disk after a certain amount of
running time. Using shared_ptrs should be slightly better, but would
suffer from the same problems in some corner cases (or lead to a much
complex program to avoid memory leaks).
As pointed out by Mark, the task comes down to a space optimization
problem. Since the incriminated program takes roughly 600Mb of RAM for
a dataset of 90Mb (which could theoretically be reduced to ~20Mb), and
is essentially page faulting at each lookup (eventually multiple
times, idem when scanning).
A linear probing hash table (with a special value for unused slots)
would give slightly better results here. Optimizing for memory usage
too.
François.
So good luck, cause you're basically trying to optimize either GC or GCC.
According to gprof : the main Shed Skin bottleneck is dict::lookup
(and str::__eq__...), allocations account for only 1% of user time.
François.
Hi,
I tried with the last gc-7.2alpha6,
./configured with --enable-cplusplus --enable-parallel-mark
--enable-threads=pthreads --enable-thread-local-alloc
The code is way faster on multicore machines this way...
François.
Within 13% of Python 2.6.5:
- cpython: 4.2s
- shedskin-gc7.2-multi-proc: 4.8s
- shedskin-gc7.2-single-proc: 5.5s
- shedskin-gc6.8 (gitorious version): 6.6s
François.
Within 13% of Python 2.6.5:
- cpython: 4.2s
- shedskin-gc7.2-multi-proc: 4.8s
- shedskin-gc7.2-single-proc: 5.5s
- shedskin-gc6.8 (gitorious version): 6.6s
mark.
Yup.
Got it on par with Python by using a linear probing hash table, and
tweaking dict resizing policy a little bit.
See attached patch for the curious.
Don't think it worth applying though,
since it doesn't preserve collision order and test 194 will regress here:
_hextochr = dict(('%02x' % i, chr(i)) for i in range(256))
_hextochr.update(('%02X' % i, chr(i)) for i in range(256))
print(repr(_hextochr))
(sorted output is OK though).
I may be inclined to provide a more cautious patch if really needed.
François.
> Fran�ois.
>
Thanks Fahrzin, I was surprised too, but this test just ensures that.
There is an interesting dictnotes.txt file in the Python distribution
which explains the rationale behind the hash table implementation. It
is optimized for small dictionaries (up to 16 elements) which are very
common in Python (attribute storage, string-formatting, etc). We could
maybe take some liberalities here if it doesn't break anything.
François.
> dict-ordering is not guaranteed in the python spec since it's based onThanks Fahrzin, I was surprised too, but this test just ensures that.
> hashing which could change between even similar computers (x86 vs x64).
Got it on par with Python by using a linear probing hash table, and
tweaking dict resizing policy a little bit.
See attached patch for the curious.
I may be inclined to provide a more cautious patch if really needed.
> Fran�ois.
>
mmm, something must be wrong Mark,
don't know how the Ubuntu package was compiled,
are your sure you're pointing to the good library ?
did you try to compile it yourself, with the flags I provided yesterday ?
eventually adding --enable-large-config and exporting
CXXFLAGS="-march=native -O3" ?
François.
did you try to compile it yourself, with the flags I provided yesterday ?
eventually adding --enable-large-config and exporting
CXXFLAGS="-march=native -O3" ?
Please, double check your LD_LIBRARY_PATH:
$ export LD_LIBRARY_PATH=/usr/lib/ # gc 6.8
$ time ./dico < in
(1227153, 18)
real 0m4.952s
user 0m4.770s
sys 0m0.180s
$ export LD_LIBRARY_PATH=/usr/local/lib/ # gc 7.2
$ time ./dico < in
(1227153, 18)
real 0m4.235s
user 0m6.550s
sys 0m0.300s
Notice how user time is different, effectively taking advantage of multi-cores.
François.
Well this really starts to get annoying when load factor becomes
superior to 60~70%. Which has no chance to append in my version (or if
you use a really bad hash function...).
> As for the resize policy, I don't know if resizing less often is
> advantageous for smaller dicts or not. For this huge dict the memory
> saved probably more than makes up for the higher load factor. Nothing
> can be set in stone, it's about what works better in a given context,
> I guess...
True. My version also uses more memory, so it should be a little
longer to iterate (16.7% on average, but more in the worst case).
> On a related note, as CPython stores pointers to objects instead of
> the objects themselves, empty and dummy are stored as special pointer
> values, which helps in the memory department by eliminating the "use"
> field as well. But that can't be done for shedskin as it would cause
> conflicts when using ints...
Well its C++! We can specialize the template for value / pointer semantics.
But those builtin.* files starts to get very big... What about
switching to a directory structure: builtin/dict.hpp etc? I'm also a
bit annoyed by these public members accessed all over the place which
impair the great refactorings we may be interested in. Maybe this
should be done in another project? I don't know.
François.
Enzo,
try it for yourself,
this won't compile!
even if you get it to compile, this will break Shed Skin...
François.
Perfect Mark!!
Good decision.
>> bit annoyed by these public members accessed all over the place which
>> impair the great refactorings we may be interested in. Maybe this
>
> so let's start cleaning that up.. ;-)
Yeah! What do we do first?
Splitting then canonicalizing seems logical?
Let all volunteers come forward!
>> should be done in another project? I don't know.
>
> uh, you don't mean to develop the builtins as a project separate from
> shedskin..? :-)
Nope I meant forking Shed Skin as a whole ;)
There may be a memory leak, but …. the great difference I see is:
Python shares strings since they are immutable.
So here, you basically have 50 millions strings, whereas Python
manages 1.2 million of them constantly...
That's precisely why I want to refactor the builtins ;)
François.
Interesting... this appears to consume twice less memory.
> So you could run a second test to see how much difference it makes. I
> don't really know how Python manages the str type, but if there is a
> 50-fold difference to the number of strings stored I guess it's
> worthwhile to look into that...
Indeed. Is this sufficient as an evidence?
$ python -c'a="foo";b="foo";c="foo";assert id(a)==id(b)==id(c)'
I don't know how to accurately measure memory consumption on windows,
but for a quick test you can type dictentries by including gc_typed.h
and changing myallocate<K,V> from GC_MALLOC(n) to
typedef dictentry<K,V> T;
GC_descr T_descr;
GC_word T_bitmap[GC_BITMAP_SIZE(T)] = {0};
GC_set_bit(T_bitmap, GC_WORD_OFFSET(T,key));
GC_set_bit(T_bitmap, GC_WORD_OFFSET(T,value));
T_descr = GC_make_descriptor(T_bitmap, GC_WORD_LEN(T));
return GC_CALLOC_EXPLICITLY_TYPED(n/sizeof(T), sizeof(T), T_descr);
don't really know how Python manages the str type, but if there is a
50-fold difference to the number of strings stored I guess it's
worthwhile to look into that...
> So you could run a second test to see how much difference it makes. IIndeed. Is this sufficient as an evidence?
> don't really know how Python manages the str type, but if there is a
> 50-fold difference to the number of strings stored I guess it's
> worthwhile to look into that...
$ python -c'a="foo";b="foo";c="foo";assert id(a)==id(b)==id(c)'
I think I will write a little script today to summarize the output.
Printed the pid,
looked at it with top.
What strange IO error do you get with the id() example ?
even if it didn't share any strings, reference counting would still ma
ke sure cpython only keeps the strings in the dict allocated..
What strange IO error do you get with the id() example ?
For GC_MALLOC_IGNORE_OFF_PAGE:
2486272
7221248
From what I gather from the documentation, when you use
IGNORE_OFF_PAGE you're assuring that a pointer to the first 512 bytes
of the object will be kept. As the dict object keeps a pointer to the
very beginning of the table, I don't see why this wouldn't be true as
long as the table is still useful. It does say that it's best to
declare the pointer volatile to keep it from being fiddled with by the
compiler, but I don't know what sorts of impacts this could have...
cat the input 10 times, shedskin memory usage goes through the roof
(about 2 GB here, after which the GC quits with an error), while
cpython barely goes over 70 MB. of course fixing such a leak can also
only be good for performance..