shedskin test.py
make
it then fails with
"In file included from test.cpp:2:
/usr/local/lib/python2.6/dist-packages/shedskin/lib/itertools.hpp: In
function ‘__itertools__::izipiter<T, U>* __itertools__::izip(int,
__shedskin__::pyiter<T>*, __shedskin__::pyiter<B>*) [with T =
__shedskin__::str*, U = __shedskin__::str*]’:
test.cpp:110: instantiated from here
/usr/local/lib/python2.6/dist-packages/shedskin/lib/itertools.hpp:773:
error: no matching function for call to
‘__itertools__::izipiter<__shedskin__::str*,
__shedskin__::str*>::izipiter(__shedskin__::pyiter<__shedskin__::str*>*&,
__shedskin__::pyiter<__shedskin__::str*>*&)’
/usr/local/lib/python2.6/dist-packages/shedskin/lib/itertools.hpp:743:
note: candidates are: __itertools__::izipiter<T,
T>::izipiter(__shedskin__::pyiter<T>*) [with T = __shedskin__::str*]
/usr/local/lib/python2.6/dist-packages/shedskin/lib/itertools.hpp:740:
note: __itertools__::izipiter<T, T>::izipiter() [with
T = __shedskin__::str*]
/usr/local/lib/python2.6/dist-packages/shedskin/lib/itertools.hpp:687:
note: __itertools__::izipiter<__shedskin__::str*,
__shedskin__::str*>::izipiter(const
__itertools__::izipiter<__shedskin__::str*, __shedskin__::str*>&)
make: *** [test] Error 1
"
I am on ubuntu lucid.
Am I just compiling it wrong?
Raphael
--- full python source ---
#!/usr/bin/python
import math
import string
import random
import itertools
n = 20
l = int(math.floor(math.pow(27,3.0/4.0)))
sigma = int(math.floor(math.sqrt(n)))
assert(sigma < 10)
alphabet = (string.digits)[0:sigma]
#Create two random strings
def string_generator(size=l, chars=alphabet):
return ''.join(random.choice(chars) for x in range(size))
def hamdist(str1, str2):
"""Count the # of differences between equal length strings str1
and str2"""
diffs = 0
for ch1, ch2 in itertools.izip(str1, str2):
if ch1 != ch2:
diffs += 1
return diffs
unknown = string_generator()
pattern = string_generator(2*l-1)
#Now work out what the Hamming distance are
hds = []
for i in xrange(0,l):
hd = hamdist(unknown, pattern[i:l+i])
print hd,
hds.append(hd)
stringsleft =[''.join(y) for y in itertools.product(alphabet, repeat =
l)]
for i in xrange(0,l):
stringsleft = set(stringsleft).intersection(set([y for y in
stringsleft if hamdist(pattern[i:i+l], y) == hds[i]]))
print len(stringsleft)
--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
2011/12/8 刘振海 <1989...@gmail.com>:
> 2011/12/8 lesshaste <drr...@gmail.com>
>>
>> I am trying to use shedskin version 0.9 for the first time (in a long
>> time). I have a simple toy python script which I attach at the
>> bottom. I saved it as test.py. These are the steps I carried out
>>
>> shedskin test.py
>> make
>>
>> it then fails with
>>
>> <snip>
>>
> I tried it under msvc ,it fails too.
> there is something wrong with the itertools.hpp
Fixed, waiting for the fix to be merged into mainline.
https://gitorious.org/shedskin/mainline/merge_requests/1
Thanks for reporting, best regards,
--
Jérémie
--
Jérémie
--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.
I optimized str.__ne__ somewhat in GIT. after that, set(stringsleft)
was now clearly the slowest part of the program (and cpython is really
fast at this of course). but it looks like it is not needed, really..?
see below. now the compiled code (using shedskin -bw from GIT) runs in
about 6.5 seconds here, versus 14 seconds for the original program
using cpython. about 60% of the total time is now spent on the
following line (half in join, half in itertools.product):
stringsleft =[''.join(y) for y in itertools.product(alphabet, repeat =
l)]
so another minor optimization here could really improve things
further. looking into this next..
thanks for triggering!
mark.
srepmub@akemi:~/shedskin$ diff -u orig.py hup.py
--- orig.py 2011-12-10 11:25:22.247936948 +0100
+++ hup.py 2011-12-10 11:46:57.639881674 +0100
@@ -20,8 +20,8 @@
"""Count the # of differences between equal length strings str1
and str2"""
diffs = 0
- for ch1, ch2 in itertools.izip(str1, str2):
- if ch1 != ch2:
+ for i in range(len(str1)):
+ if str1[i] != str2[i]:
diffs += 1
return diffs
@@ -39,6 +39,5 @@
l)]
for i in xrange(0,l):
- stringsleft = set(stringsleft).intersection(set([y for y in
-stringsleft if hamdist(pattern[i:i+l], y) == hds[i]]))
+ stringsleft = set([y for y in stringsleft if hamdist(pattern[i:i
+l], y) == hds[i]])
print len(stringsleft)
On Dec 8, 9:23 pm, Mark Dufour <mark.duf...@gmail.com> wrote:
> thanks guys! jeremie, I pushed your fix.
>
> note that the program becomes much faster when using the more optimized
> 'zip' builtin here, and faster still by using something like the following:
>
> for i in range(len(str1)):
> if str1[i] != str2[i]:
>
> still this makes it only marginally faster than the python program on my
> pc. there are a few things we could improve here and there, but there's no
> clear bottleneck at this point..
>
> thanks again,
> mark.
>
> 2011/12/8 Jérémie Roquet <arkano...@gmail.com>
>
>
>
>
>
>
>
>
>
> > Hi,
>
> > 2011/12/8 刘振海 <1989l...@gmail.com>:
jeremie, pre-allocating tuples as in the following makes it go a lot
faster. useful I think, because product is probably the most used
function in itertools, but it also seems applicable to more functions
in itertools (and small-object-allocation in shedskin in general
perhaps.. ;-))
example goes from 5.8 to 4.8 seconds with this. the 1000 is probably a
bit too much though. a value of 25 also makes the time go down to
about 5.0 seconds. what do you think?
thanks,
mark.
if (this->iter.size()) {
- for (int i = 0; i < (int)this->iter.size(); ++i) {
+ int itersize = (int)this->iter.size();
+ tuple->units.resize(itersize);
+ for (int i = 0; i < itersize; ++i) {
int j = this->iter[i];
- tuple->units.push_back(this->values[j][this-
>indices[i]]);
+ tuple->units[i] = this->values[j][this->indices[i]];
--- itertools.hpp 2011-12-10 14:17:56.875495121 +0100
+++ newiter.hpp 2011-12-10 13:20:39.355641798 +0100
@@ -1024,6 +1024,9 @@
std::vector<unsigned int> iter;
std::vector<unsigned int> indices;
+ tuple2<T, T> *__tuple_cache;
+ int __tuple_count;
+
productiter();
void push_iter(pyiter<T> *iterable);
@@ -1037,6 +1040,8 @@
template<class T> inline productiter<T, T>::productiter() {
this->exhausted = false;
this->highest_exhausted = 0;
+ this->__tuple_cache = new tuple2<T,T>[1000];
+ this->__tuple_count = 0;
}
template<class T> void productiter<T, T>::push_iter(pyiter<T>
*iterable) {
@@ -1079,7 +1084,11 @@
throw new StopIteration();
}
- tuple2<T, T> *tuple = new tuple2<T, T>;
+ tuple2<T, T> *tuple = &(this->__tuple_cache[this->__tuple_count+
+]);
+ if(this->__tuple_count == 1000) {
+ this->__tuple_count = 0;
+ this->__tuple_cache = new tuple2<T,T>[1000];
+ }
On Dec 10, 1:37 pm, srepmub <mark.duf...@gmail.com> wrote:
Sorry for the late and very short reply.
2011/12/10 srepmub <mark....@gmail.com>:
> jeremie, pre-allocating tuples as in the following makes it go a lot
> faster. useful I think, because product is probably the most used
> function in itertools, but it also seems applicable to more functions
> in itertools (and small-object-allocation in shedskin in general
> perhaps.. ;-))
Yeah, that's a great idea. Not only this is faster, but it also avoids
spending too much memory for the heap-admin, since the tuples are
small.
> example goes from 5.8 to 4.8 seconds with this. the 1000 is probably a
> bit too much though. a value of 25 also makes the time go down to
> about 5.0 seconds. what do you think?
I don't have time to set up any benchmark yet, but I guess the more
the better. Of course, it has to be a reasonable value, otherwise the
whole interest in using iterators is gone, but I don't think 1000 is
too much.
On the other hand, this may lead to some serious overhead if we are
combining small containers, so we should allocate std::min(1000, the
maximum number of tuples we'll actually need) and not 1000 (eg.
product('AB', 'CD') should allocate 4 tuples and not more).
2011/12/10 srepmub <mark....@gmail.com>:
> 3.5 seconds :-)
>
> if (this->iter.size()) {
> - for (int i = 0; i < (int)this->iter.size(); ++i) {
> + int itersize = (int)this->iter.size();
> + tuple->units.resize(itersize);
> + for (int i = 0; i < itersize; ++i) {
> int j = this->iter[i];
> - tuple->units.push_back(this->values[j][this-
>>indices[i]]);
> + tuple->units[i] = this->values[j][this->indices[i]];
> }
I wonder how using .reserve() instead would perform.
Best regards,
--
Jérémie
I don't have time to set up any benchmark yet, but I guess the more
the better. Of course, it has to be a reasonable value, otherwise the
whole interest in using iterators is gone, but I don't think 1000 is
too much.
On the other hand, this may lead to some serious overhead if we are
combining small containers, so we should allocate std::min(1000, the
maximum number of tuples we'll actually need) and not 1000 (eg.
product('AB', 'CD') should allocate 4 tuples and not more).
I wonder how using .reserve() instead would perform.
I _think_ product can be quite a bit faster still, esp. in cases where
the iterator is consumed right away (as in a 'for .. in product(..)'
or list(product(..))), by adding some things to class productiter so
that FOR_IN can specialize things more aggressively (as it does right
now for iteration over most builtins).
this was actually jeremie's idea long ago, but implemented after his
contribution of itertools.. :-)
thanks,
mark.