//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;
int main(){
vector<string> a;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s
I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>
> //C++
> #include <iostream>
> #include <string>
> #include <vector>
> #include <set>
> #include <algorithm>
> using namespace std;
>
> int main(){
> vector<string> a;
> for (long int i=0; i<10000 ; ++i){
> a.push_back("What do you know?");
> a.push_back("so long...");
> a.push_back("chicken crosses road");
> a.push_back("fool");
> }
> set<string> b(a.begin(), a.end());
> unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> }
Why are you using `unique_copy` here?
> #python
> def f():
> a = []
> for i in range(10000):
> a.append('What do you know')
> a.append('so long...')
> a.append('chicken crosses road')
> a.append('fool')
> b = set(a)
> for s in b:
> print s
>
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?
There's a difference in data structures at least. The Python `set` type
is implemented with a hash algorithm, so the equivalent STL type would be
`hash_set`. `set` in Python does not store its contents sorted.
Ciao,
Marc 'BlackJack' Rintsch
Sorry, that's a typo. Actually I used 'copy'.
>
> > #python
> > def f():
> > a = []
> > for i in range(10000):
> > a.append('What do you know')
> > a.append('so long...')
> > a.append('chicken crosses road')
> > a.append('fool')
> > b = set(a)
> > for s in b:
> > print s
> >
> > I was using VC++.net and IDLE, respectively. I had expected C++ to be
> > way faster. However, while the python code gave the result almost
> > instantly, the C++ code took several seconds to run! Can somebody
> > explain this to me? Or is there something wrong with my code?
>
> There's a difference in data structures at least. The Python `set` type
> is implemented with a hash algorithm, so the equivalent STL type would be
> `hash_set`. `set` in Python does not store its contents sorted.
>
> Ciao,
> Marc 'BlackJack' Rintsch
Thank you for your comments. I tested with hash_set, but I didn't see
much performance improvement. When I increased the loop to 1 million
times, the python code still ran reasonably fast and the C++ code got
stuck there. This totally surprised me, because according this page
http://norvig.com/python-lisp.html, the speed of python is nowhere near
that of C++.
Hi,
I'm no C++ guru so cannot comment on the C++ code itself, however I do
wonder if you tested your C++ code with other STL implementation such
as gcc (gcc is available on windows as well, in various versions).
What could be is that expanding the list in C++ is done in very small
increments, leading to many re-allocations. Is it possible to
pre-allocate the vector<> with sufficient entries?
Also, your Python code as quoted, doesn't actually call your function
f(). If you say that you get results instantly, I assume that you mean
all 4 strings are actually printed to console?
(I'm surprised that the console prints things that fast).
btw, using range() in Python isn't very efficient, I think... Better to
use xrange().
Asked a C++ collegue of mine to comment, and he strongly suspects that
you're actually running it in the .Net runtime (your C++ code contains
some C#-isms, such as omitting the '.h' in the include <> statements).
Luck,
--Tim
The set should be only 4 items in size, according to my reading of the
code, so set implementation differences shouldn't lead to drastic
performance differences.
> Ciao,
> Marc 'BlackJack' Rintsch
Cheers,
--Tim
> (your C++ code contains some C#-isms, such as omitting the '.h' in the
> include <> statements).
That's no C#-ism, that's C++. The standard C++ header names don't have a
trailing '.h'. ``gcc`` prints deprecation warnings if you write the
names with '.h'.
Ciao,
Marc 'BlackJack' Rintsch
We stand corrected.
--Tim
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?
in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.
in the C++ example, you're creating 40000 string objects.
</F>
I suspect that other than what other posters have suggested about your
code, the difference in speed is due to the way you build your C++
executable...
HTH,
Ray
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?
Just a guess: immutable strings might be Python's advantage. Due to your
"benchmark"'s simplicity you end up with 10000 string instances in C++ and
just four str-s (and a lot of pointers) in Python.
What happens if you replace 'string' with 'const char *' in C++ ?
(Note that this modification is a bit unfair to Python as it would not
detect equal strings in different memory locations)
Peter
In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.
:)
>> in the C++ example, you're creating 40000 string objects.
>
> In which case, Licheng, you should try using the /GF switch. This will
> tell Microsoft C++ compiler to pool identical string literals together.
in what way does that change the implementation of C++'s string type ?
</F>
The code still creates a new string - instance each time it tries to
append a const char* to the vector<string> ...
You should instead create the string-objects ahead of time, outside of
the loop.
Regards,
--Tim
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?
It must be the debugging, the compiler or a poor STL implementation. With
gcc 4 it runs instantly on my computer (using -O2), even with 10x the
number of values.
If the problem is that C++ has to make lots of new strings, as other posters
have suggested, then you could do something like
const string foo = "What do you know?";
for (long int i=0; i<10000 ; ++i){
   a.push_back(foo);
...
}
as many C++ implementations use reference counting for identical strings.
Jeremy
--
Jeremy Sanders
http://www.jeremysanders.net/
As a matter of fact, do not count on that. Use a vector<string*> just in
case.
Alternatively, slow down the Python implementation by making Python
allocate new strings each time round:
a.append('%s' % 'What do you know')
... for each of your string-appends. But even then, the python-code is
still near-instant.
Cheers,
--Tim
Ah, yes what was I thinking? The fact that it stores std::string
objects escaped my mind somehow. /GF just pools the string literals.
Thanks for the correction.
>
> </F>
Yeah, you're right... I've been programming Java too long :)
Took me a while to see that too! Have been programming too much Java /
Python as well. Anyways, when changing the Python version so that it
adds 40.000 unique strings to the list (and proving that there are
indeed 40.000 unique ids in the list, by making a set of all id()s in
the list and taking the len() of that set), it still takes at most a
second. I cannot test the speed of the c++ version on my computer, so
nothing scientific here.
I'm curious though, if on the OP's machine the slowed-down Python
version is still faster than the C++ version.
Cheers,
--Tim
> It must be the debugging, the compiler or a poor STL implementation. With
> gcc 4 it runs instantly on my computer (using -O2), even with 10x the
> number of values.
$ gcc --version
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5363)
I adapted original poster's code and made a function that did not create
strings each time. The NoisyString is a class we can use to actually
track copying.
In fact Python here is faster. Suppose it has a really optimized set
class...
Here some results (I know that the fpoint optimizations are useless...
it's is my "prebuilt" full optimization macro :) ):
$ g++ -O3 -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71
$ g++ -Os -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71
$ g++ -O3 -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.47
Elapsed 0.18
$ g++ -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.63
Elapsed 0.33
$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds
Elapsed: 3.810000 seconds
------------------- PYTHON CODE ---------------------------------
#python
global size
size = 1000000
def f():
a = []
for i in range(size):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s
def slow_f():
a = []
for i in range(size):
a.append('%s' % 'What do you know')
a.append('%s' % 'so long...')
a.append('%s' % 'chicken crosses road')
a.append('%s' % 'fool')
b = set(a)
for s in b:
print s
import time
from time import clock
f_start = clock()
f()
f_end = clock()
slow_f_start = clock()
slow_f()
slow_f_end = clock()
print "Elapsed: %f seconds" % (f_end - f_start)
print "Elapsed: %f seconds" % (slow_f_end - slow_f_start)
------------------------------------------------------------------
----------------- CPP CODE -------------------------------------
#include <iostream>
#include <ostream>
#include <iterator>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
#include <ctime>
using namespace std;
#define SIZE 1000000
class NoisyString : public std::string {
public:
NoisyString(const string& cp)
: string(cp)
{
cout << "Fuck I got copied!" << endl;
}
NoisyString(const char* s ) : string(s) {
}
};
void f(){
vector<string> a;
for (long int i=0; i<SIZE ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
void fast_f(){
vector<string> a;
string s1 = "What do you know?" ;
string s2 = "so long..." ;
string s3 = "chicken crosses road";
string s4 = "fool" ;
for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<string> b(a.begin(), a.end());
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
int main(){
clock_t f_start,
f_end,
faster_f_start,
faster_f_end,
fast_f_start,
fast_f_end;
f_start = clock();
f();
f_end = clock();
fast_f_start = clock();
fast_f();
fast_f_end = clock();
cout << "Elapsed " << (f_end - f_start) / double(CLOCKS_PER_SEC) <<
endl;
cout << "Elapsed " << (fast_f_end - fast_f_start) /
double(CLOCKS_PER_SEC) << endl;
}
-----------------------------------------------------------------------
--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.
> I'm curious though, if on the OP's machine the slowed-down Python
> version is still faster than the C++ version.
I tested both on my machine (my other post in the thread)
> In fact Python here is faster. Suppose it has a really optimized set
> class...
Python's memory allocator is also quite fast, compared to most generic
allocators...
</F>
> Python's memory allocator is also quite fast, compared to most generic
> allocators...
In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.
But your C++ program outputs times in seconds, right? So all
compilations except for the first two give results in less than a
second, right? (meaning the optimizations of your standard-compilation
give worst results than -O3?)
BTW, I don't quite understand your gcc optimizations for the first 2
compiles anyways: two -O options with different values. Doesn't that
mean the 2nd -O takes preference, and the compilation is at -O2 instead
of -O3?
Why both -O3 and -O2 at the command-line?
Cheers,
--Tim
Well, I guess I'm getting really obsessed with this. But anyways. I
installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
-- ancient, yes, but on windows it's the latest available.
I compiled Mc Osten's C++ program (tweaked the output a little) and ran
it; ran his version of the python code too.
Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!
Here's the output from my console:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 40000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.037574 seconds
Slow - Elapsed: 0.081520 seconds
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.089 seconds
Slow - Elapsed: 6.303 seconds
LeeuwT@nlshl-leeuwt ~/My Documents/Python
Cheers,
--Tim
> Here some results (I know that the fpoint optimizations are useless...
> it's is my "prebuilt" full optimization macro :) ):
Interesting. The opimisation makes no difference to the speed of the C++ one
for me. I just get
xpc17:~> g++4 -O2 test2.cpp
xpc17:~> ./a.out
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 2.11
Elapsed 1.11
(This is with an Althon 64 4600+ running Linux).
Unfortunately the Python on this computer doesn't have set as it is too old,
so I can't compare it.
> But your C++ program outputs times in seconds, right? So all
> compilations except for the first two give results in less than a
> second, right? (meaning the optimizations of your standard-compilation
> give worst results than -O3?)
Yes. It's in seconds but the benchmark that are one order of magnitudo
less than the others have of a different "size" (100000 instead of
1000000). That is cut and paste from my terminal... I think it's a
mess. I do it all again from scratch.
> BTW, I don't quite understand your gcc optimizations for the first 2
> compiles anyways: two -O options with different values. Doesn't that
> mean the 2nd -O takes preference, and the compilation is at -O2 instead
> of -O3?
> Why both -O3 and -O2 at the command-line?
I forgot I put -O2 in my $FAST_FLAGS. I don't know what I was thinking
about.
This the correct version
$ g++ -Os -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 6.3
Elapsed 2.1
$ g++ -O2 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.7
$ g++ -O3 -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.79
Elapsed 1.72
$ g++ -pipe -march=pentium-m -msse3 -fomit-frame-pointer -mfpmath=sse
-o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 7.12
Elapsed 2.98
$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds
Elapsed: 3.800000 seconds
> Oh boy; yes indeed the slow python is faster than the fast C++
> version... Must be something really awful happening in the STL
> implementation that comes with GCC 3.4!
And the Python version does the very same number of iterations than the
C++ one? I suppose they are looping on arrays of different sizes, just
like my "first version".
Hmmm.. You're quite right. The C++ version had an array size 100.000
(your version), the Python version still had an array size 10.000 (as
in my modified copy of the original version).
When fixing the Python version to have 100.000 items, like the C++
version, the Python timings are:
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.512088 seconds
Slow - Elapsed: 1.139370 seconds
Still twice as fast as the fastest GCC 3.4.5 compiled version!
Incidentally, I also have a version compiled with VC++ 6 now... (not
yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
for speed, here's the result of VC++ 6:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.481 seconds
Slow - Elapsed: 4.842 seconds
So you can see that it's 'slow' version of the code is faster than the
'slow' version compiled with GCC, but the 'fast' code is barely faster
than the 'slow' code! And the 'fast' version compiled with GCC is much
faster than the 'fast' version compiled with VC++ 6!
My conclusion from that is, that the vector<> or set<> implementations
of GCC are far superior to those of VC++ 6, but that memory allocation
for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
(And Python still smokes them both).
Cheers,
--Tim
And the results of IronPython (1.0rc2) are just in as well:
IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>>
>>> import sys
>>> sys.path.append('c:/documents and settings/leeuwt/my documents/python')
>>> import SpeedTest
>>> SpeedTest.run_test()
Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 400000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 1.287923 seconds
Slow - Elapsed: 4.516272 seconds
>>>
And for Python 2.5:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 0.440619 seconds
Slow - Elapsed: 1.095341 seconds
LeeuwT@nlshl-leeuwt ~/My Documents/Python
But beware! For Python2.5 I had to change the code slightly, because it
already realized that the expression
'%s' % 'something'
will be a constant expression, and evaluates it once only... so I had
to replace '%s' with a variable, and I got the timings above which show
Python2.5 to be slightly faster than Python2.4.
(Next step would be to create a VB version and a Java version of the
same program, oh and perhaps to try a version that would work with
Jython... perhaps somehow w/o the 'set')
Cheers,
--Tim
Tim> '%s' % 'something'
Tim> will be a constant expression, and evaluates it once only... so I
Tim> had to replace '%s' with a variable, and I got the timings above
Tim> which show Python2.5 to be slightly faster than Python2.4.
Shouldn't you then get rid of any compiler optimizations your C++ compiler
does? Why penalize 2.5 because it recognizes a useful optimization?
Tim> (Next step would be to create a VB version and a Java version of
Tim> the same program, oh and perhaps to try a version that would work
Tim> with Jython... perhaps somehow w/o the 'set')
I don't recall the example exactly, but couldn't you just create a set class
that uses a dict under the covers and only implement the methods you need
for the test?
Skip
Maybe I'm missing something but the posted c++codes are not equivalent IMO to
what python is doing. I discarded the "slow" version, and tried to get the
equivalent in c++ of :
"""
#!/usr/bin/env python
size = 1000000
def f():
a = []
for i in range(size):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s
import time
from time import clock
f_start = clock()
f()
f_end = clock()
print "Elapsed: %f seconds" % (f_end - f_start)
"""
I came at first with the following, which is still slower than the python
version :
"""
void print_occurence_of_unique_strings(){
vector<string> a;
const string& s1 = "What do you know?" ;
const string& s2 = "so long..." ;
const string& s3 = "chicken crosses road";
const string& s4 = "fool" ;
for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<string> b(a.begin(), a.end());
copy(b.begin(), b.end(),
ostream_iterator<string>(cout, "\n"));
}
"""
Here, all strings, while passed by reference to the vector, are copied one by
one.
Then, I tried this, it just overcome the performance of python code, but not
in the proportion I expected :
"""
void print_occurence_of_unique_strings_compare_by_adresses(){
vector<string*> a;
string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){
a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<string> b;
for (vector<string*>::iterator it=a.begin(); it!=a.end(); it++)
b.insert(**it);
copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
"""
The problem here, is that the strings in the set are compared by value, which
is not optimal, and I guess python compare them by adress ("s*n is s*n" has
the same complexity than "s*n == s*n" in CPython, right ?).
so, finally, the code in c++, about ten times faster than the equivalent in
python, must be :
"""
void print_occurence_of_unique_strings_compared_by_address(){
cout << "print_occurence_of_unique_strings_compared_by_address" << endl;
vector<string*> a;
string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){
a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<string*> b(a.begin(), a.end());
set<string> c; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));
}
"""
the result on my box is :
maric@redflag2 mar aoû 22 22:24:23:~$ g++ -O3 -o testcpp testcpp.cpp
maric@redflag2 mar aoû 22 22:24:29:~$ ./testcpp
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 1.89
unique strings : 0.87
compared by address : 0.18
maric@redflag2 mar aoû 22 22:24:38:~$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.680000 seconds
maric@redflag2 mar aoû 22 22:24:51:~$ g++ -v
Using built-in specs.
Target: i486-linux-gnu
Configured
with: ../src/configure -v --enable-languages=c,c++,java,fortran,objc,obj-c++,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.1-1.4.2.0/jre --enable-mpfr --with-tune=i686 --enable-checking=release
i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20060613 (prerelease) (Debian 4.1.1-5)
I've joined the full c++ file as an attachment.
--
_____________
Maric Michaud
_____________
Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097
I agree that Python2.5 recognized a useful optimization, and didn't
wish to penalize it for that, however the optimalization was defeating
the purpose of my code in the first place!
Cheers,
--Tim
> The problem here, is that the strings in the set are compared by value, which
> is not optimal, and I guess python compare them by adress ("s*n is s*n" has
> the same complexity than "s*n == s*n" in CPython, right ?).
wrong.
> timeit -s"s='x'; n=1000" "s*n is n*s"
1000000 loops, best of 3: 1.9 usec per loop
> timeit -s"s='x'; n=1000" "s*n == n*s"
100000 loops, best of 3: 4.5 usec per loop
</F>
Your C++ version got me the following timings (using gcc 3.4.5 as the
compiler, MinGW version, with -O6):
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 2.135
unique strings : 1.103
compared by address : 0.21
For reference, Python's best time was 0.39 seconds on the same computer
(in the 'fast' version, using only 4 unique string instances).
Hmmm... Can we conclude now that carefully crafted C++ code is about
twice as fast as casually and intuitively written Python code? ;) (Just
kidding here of course)
NB: Your code now tests for address-equality. Does it also still test
for string-equality? It looks to me that it does, but it's not quite
clear to me.
Cheers,
--Tim
> NB: Your code now tests for address-equality. Does it also still test
> for string-equality? It looks to me that it does, but it's not quite
> clear to me.
It does it.
set<string*> b(a.begin(), a.end());
set<string> c; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));
When we populate the first set, we get rid of all strings with same
object id/address (it test equality of pointers). Then we populate
another set (and the default equality test is on strings).
However, I would have written the code using a proper compare function
rather than using two sets. In this particular case the number of
elements of the first set is negligible in respect of the initial vector
size, thus copying it again does not take a lot of time.
But such code is optimized for the problem itself: in the real world I
suppose we would have passed set a proper comparison function that
checks address and then string equality.
> My conclusion from that is, that the vector<> or set<> implementations
> of GCC are far superior to those of VC++ 6, but that memory allocation
> for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
> (And Python still smokes them both).
It would be interesting to test it with VC 8 (2005). I have it in my
Parallels vm, but it looks like something is wrong. The very same code
takes almost a minute, I suppose there is something wrong with it
(Python is almost as fast as the python 2.4 on MacOS).
> And the results of IronPython (1.0rc2) are just in as well:
I can't test this one.
>
> And for Python 2.5:
> LeeuwT@nlshl-leeuwt ~/My Documents/Python
> $ /cygdrive/c/Python25/python.exe SpeedTest.py
> Begin Test
> Number of unique string objects: 4
> so long...
> What do you know
> fool
> chicken crosses road
> Number of unique string objects: 400000
> so long...
> What do you know
> fool
> chicken crosses road
> Fast - Elapsed: 0.440619 seconds
> Slow - Elapsed: 1.095341 seconds
What the heck... you have a Cray, haven't you?
$ /opt/misc/bin/python2.5 -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.300000 seconds
Elapsed: 1.290000 seconds
Yes... good optimizer work. The 'slow' code here is faster than the fast
one.
$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.360000 seconds
Elapsed: 3.800000 seconds
> (Next step would be to create a VB version and a Java version of the
> same program, oh and perhaps to try a version that would work with
> Jython... perhaps somehow w/o the 'set')
Ok. I can do the Java version. If I find a RealBasic Set class I can do
it. However, I don't remember anything about VB6, and have done nothing
with .Net.
But I don't think it is that interesting. Java strings are immutable
too: I expect it to outperform Python (unless Java Set class sucks). And
I don't see the point of taking in VB.
A good BASIC implentation is comparable with Pascal or C++ speedwise.
(At least this results from Great Language Shootout and Free Basic).
OK, now I'm getting obsessed with this too ;-)
I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.
Here's the code:
int main(){
DWORD begin = ::GetTickCount();
vector<string> a;
string c = "What do you know?";
string d = "so long...";
string e = "chicken crosses road";
string f = "fool";
for (long int i=0; i<10000 ; ++i){
a.push_back(c);
a.push_back(d);
a.push_back(e);
a.push_back(f);
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}
And here's the result:
\TestSTL\release>TestSTL.exe
What do you know?
chicken crosses road
fool
so long...
Ends in 31 ms.
I tried the original version:
int main(){
DWORD begin = ::GetTickCount();
vector<string> a;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}
And the result is only 50% slower:
\TestSTL\release>TestSTL.exe
What do you know?
chicken crosses road
fool
so long...
Ends in 47 ms.
I am a c++ newbie but I think c++ should be faster here.
Maybe someone can post this to the c++ maillist and they will tell how
to accelerate it.
Tim N. van der Leeuw wrote:
Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.
> I am a c++ newbie but I think c++ should be faster here.
Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.
<snip>
> I'm using VC++ Express, I didn't care to tweak the optimizations, I
> merely chose the "Release" configuration for the executable. It's
> blazing fast, taking only 30+ ms each run.
Of course it is faster. We are looping 1000000 times, you just 10000.
> Not really, see my test, in my other post in the same thread. I'm using
> VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
> fair that for C++ we're using the latest as well.
In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.
Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.
> Same here, although that said Python's implementation of those data
> structure must already be as optimal as mortals can do it.
I think this is the rationale behind it.
> That's to say,
> python is still much faster?
Yes it is. But of course you can't sat that "Python is faster than C++".
We found that the code to do this, written in the most natural way, is a
lot faster in Python. However, if you optimze the code, C++ gets almost
as fast.
In other benchmarks C++ outperforms Python and is 10 or 100 times
faster.
> Maybe someone can post this to the c++ maillist and they will tell how
> to accelerate it.
There are enough C++ experts here to do it. The point is another.
Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!
I'll try to tweak it later at home and report here. I'll try out the
1000000 too.
Cheers
Ray
Mine's VC 2005 Express--let me put the optimization parameters later
and measure again when I get home.
<snip>
> Yes it is. But of course you can't sat that "Python is faster than C++".
Of course not. Python is faster then assembler. Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.
Harald
> Certainly--I was not comparing 1000000 against 10000. Referring to the
> OP's statement: "However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run!" 30ms sounds like
> a definite improvement over several seconds!
Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).
> Proofed @ EuroPython
> 2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
> gave their presentations before.
Have you some link? I suppose it's kind of a joke they did or something
like that...
Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)
Ah ! wrong, thanks, but not in our case :
In [80]: for i in (10**e for e in range(5)) :
....: res1 = timeit.Timer('s == t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: res2 = timeit.Timer('s is t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: print res1/res2
....:
....:
1.10532866525
1.27507328965
1.90244004672
8.33974283485
89.5215441627
Indeed, it's wrong for two instances of str, but not if we compare an instance
to itself :
In [79]: for i in (10**e for e in range(9)) :
....: r1=timeit.Timer('s is p', "s, p = ('s'*%s,)*2" % i).timeit()
....: r2=timeit.Timer('s == p', "s, p = ('s'*%s,)*2" % i).timeit()
....: print r1/r2
....:
....:
0.854690643008
0.872682262181
0.851785060822
0.871193603744
0.890304121256
0.86925960859
0.846364097331
0.91614070798
0.825424114324
So my lastest c++ algorithm is the good one still, as the python code is
like :
In [29]: timeit.Timer('set(t)', 't=("e"*10,)*10').timeit()
Out[29]: 1.883868932723999
In [30]: timeit.Timer('set(t)', 't=("e"*100,)*10').timeit()
Out[30]: 1.8789899349212646
and not :
In [27]: timeit.Timer('set(t)', 't=["e"*10 for i in range(10)]').timeit()
Out[27]: 2.6000990867614746
In [28]: timeit.Timer('set(t)', 't=["e"*100 for i in range(10)]').timeit()
Out[28]: 4.1173238754272461
> > timeit -s"s='x'; n=1000" "s*n is n*s"
>
> 1000000 loops, best of 3: 1.9 usec per loop
>
> > timeit -s"s='x'; n=1000" "s*n == n*s"
>
> 100000 loops, best of 3: 4.5 usec per loop
>
> </F>
--
Tim N. van der Leeuw a écrit :
>
> Your C++ version got me the following timings (using gcc 3.4.5 as the
> compiler, MinGW version, with -O6):
>
> ...
>
>
> Hmmm... Can we conclude now that carefully crafted C++ code is about
> twice as fast as casually and intuitively written Python code? ;) (Just
> kidding here of course)
Every C++ code must be carefully crafted :).
But your benchmark surprise me, here is what I get on my windows box :
Python2.4 :
===========
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python2.4
/usr/bin/python2.4
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python
/cygdrive/c/Python24/python
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.655000 seconds
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python testpython.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 3.077764 seconds
Cygwin version is slower, but not too much.
C++, compiled with VS2005, release, no other configuration :
============================================================
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 16.718 # memory allocation is quite bad
unique strings : 1.188
compared by address : 0.453
C++, with gcc 3.3/Cygwin
========================
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ g++ -O3 -o testcpp testcpp.cpp
maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ ./testcpp
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 17.266 # still bad
unique strings : 1.547
compared by address : 0.375
Hum, with my old gcc I get equal or better performances than with VS2005.
Finally, the equivalent code is still about 10x faster in c++ than in
Python, as it was on my Linux box.
Mc Osten a écrit :
...
> However, I would have written the code using a proper compare function
> rather than using two sets. In this particular case the number of
> elements of the first set is negligible in respect of the initial vector
> size, thus copying it again does not take a lot of time.
> But such code is optimized for the problem itself: in the real world I
Of course, it's a quick hack to get the same behavior.
> suppose we would have passed set a proper comparison function that
> checks address and then string equality.
>
Yes, furthermore, that is *exactly* what the python code is doing (see
my other post to Fredrik).
Maybe this can be useful to speed up C++ hashing (now used in ShedSkin
too):
http://www.azillionmonkeys.com/qed/hash.html
I suggest to test this code with the D language too, it has built-in
dicts too (associative arrays, impleented with trees and not hashes),
so the source code needed is pretty short.
Bye,
bearophile
> Yeah, my guess would be either he used the Debug configuration or he
> actually created a Managed executable instead of a pure Win32
> application. Sigh, now I can't wait to get home and try it out :)
Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).
I have to admit to a stupid mistake, for which I feel quite ashamed - I
got the loop-size wrong in the Python code. So all Python results
posted by me were off by a factor of 10 :-(
I feel quite bad about that!
With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:
============ Python 2.4 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python24/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.239721 seconds
Slow - Elapsed: 11.883234 seconds
============ Python 2.5 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py
Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 4000000
so long...
What do you know
fool
chicken crosses road
Fast - Elapsed: 4.031873 seconds
Slow - Elapsed: 11.314742 seconds
============ GCC 3.4.5, MinGW, -O6 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 2.088 seconds
Slow - Elapsed: 7.033 seconds
============ VC++ 6, 'release' build =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe
Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Fast - Elapsed: 4.585 seconds
Slow - Elapsed: 5.024 seconds
========== GCC 3.4.5, MinGW, -O6, with most optimized C++ code
==========
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
strings : 2.338
unique strings : 1.109
compared by address : 0.23
LeeuwT@nlshl-leeuwt ~/My Documents/Python
============ IronPython 1.0rc2 =============
IronPython had a hard time coping with it; creating 4 million string
objects is a bit too much and the CLR was eating nearly a gigabyte of
memory near the end.
Here are the numbers:
IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>> import sys
>>> sys.path.append('C:/Documents and Settings/LeeuwT/My Documents/Python')
>>> import SpeedTest
>>> SpeedTest.run_test()
Begin Test
Number of unique string objects: 4
What do you know
so long...
chicken crosses road
fool
Number of unique string objects: 4000000
What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 10.501273 seconds
Slow - Elapsed: 371.047343 seconds
>>>
============ Java 1.6.0 b2 =============
Set size: 4
chicken crosses road
What do you know
fool
so long...
Set size: 4
chicken crosses road
What do you know
fool
so long...
Fast - Elapsed 1.003 seconds
Slow - Elapsed 3.96 seconds
============ Java 1.5.0 =============
Set size: 4
fool
What do you know
so long...
chicken crosses road
Set size: 4
fool
What do you know
so long...
chicken crosses road
Fast - Elapsed 1.754 seconds
Slow - Elapsed 5.044 seconds
=========================
Note that the Python code creates a set of all unique id's of all
objects in list a, and prints the length of this set, to verify that
all strings are really unique instances or duplicate instances. The C++
versions don't do that (at least not for 4 million strings); so Python
is at a slight disadvantage here. Printing the number of strings still
didn't help me catch the off-by-ten errors though.
I included a Java version of the program, and it looks like it performs
quite well compared to C++ both with jdk1.5 and jdk1.6.
I humbly apologize for my misinformation yesterday.
Regards,
--Tim
*Phew*
Great to know that my model of how the world works is still correct!
(at least in relation to Python and C++!) :)
Thanks,
Ray
Perhaps more optimal. We've had (tim)bots working on the problem for years.
Skip
Harald> Of course not. Python is faster then assembler. Proofed @
Harald> EuroPython 2006 in CERN, near the LHC Beta, in the same room
Harald> many Nobel laurates gave their presentations before.
Harald, do you have a reference to a talk abstract or a paper?
Thx,
Skip
Via the lightning talks page:
http://wiki.python.org/moin/EuroPy2006LightningTalks
Here's a direct link:
Of course, Harald's interpretation of the "faster than" qualification
is broader than a pure assessment of raw performance, so I wouldn't
expect too many technical revelations. And while footage exists of at
least one talk in the CERN Auditorium which led to a Nobel prize, I
don't think that any footage of the EuroPython lightning talks (or of
any of the other talks) has been released just yet.
Paul
- This code runs amortized 100ms on my machien (vc.net 2003 pro,
dinkumware stl, p4m 2.2GHz thinkpad, windows 2003 server), (10 loops in
1000ms)
- with STLPort 5.2, this code runs in 39-43ms (10 loops in 390ms ->
430ms).
a. did you compile in release mode? and if yes, vc.net _standard_
edition has no optimizing compiler in release mode, you need vc.net pro
or enterprise. or did you use vc.net 2005 ?
you should also include <algorithm> for ostream_operator.
Hi, you are benching heap allocations and of course heap fragmentation.
this is what devpartner c++ profiler had to say:
Method %in % with Called Average Real Name Method Children
-----------------------------------------------------------------
RtlFreeHeap 52,8 52,8 81.034 7,6 7,7
RtlAllocateHeap 19,8 19,8 81.089 2,9 2,9
main 16,9 89,7 1 198083,2 1052376,0
ExitProcess 10,3 10,3 1 120616,8 120641,3
...
So on linux its a lot better than that, because - I think - ptmalloc is
used which is based on dmalloc which is efficient for those small size
allocs (which malloc() and free() were never designed for).
on win32, activate the low fragmenting heap or write a custom stl
allocator.
_set_sbh_threshold(128); // method 1
ULONG HeapFragValue = 2;
HeapSetInformation((HANDLE)_get_heap_handle(), // method 2
HeapCompatibilityInformation,
&HeapFragValue,
sizeof(HeapFragValue));
last non-python message from me :)
> I have to admit to a stupid mistake, for which I feel quite ashamed - I
> got the loop-size wrong in the Python code. So all Python results
> posted by me were off by a factor of 10 :-(
> I feel quite bad about that!
Well, this makes *my* results quite surprising.
I checked it threetimes. I loop 1000000 times in both Python and C++,
and Python here is faster.
> Great to know that my model of how the world works is still correct!
> (at least in relation to Python and C++!) :)
So please explain my results. I loop the same number of times.
<ostream>, actually.
--
Neil Cerutti
If you're really eager to see python succeed with this 'test' or 'benchmark', if you will, you can look at the results with VC7.1. It clearly shows that much depends on the STL implementation, you use.
I use the implementations posted by Maric Michaud in http://mail.python.org/pipermail/python-list/2006-August/357593.html. The tests, for which I appended 'with hash', use a hash_set instead of a plain set.
1) vc71 with dinkumware STL (which is vc's default STL):
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_strings_with_hash
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_with_hash
What do you know?
chicken crosses road
fool
so long...
strings : 10.921 <---- Oops ...
unique strings : 1.047
compared by address : 0.328
strings with hash : 11.484 <---- Oooops ...
unique strings with hash : 1.016
2) vc7.1 with STLport 5.02:
print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_strings_with_hash <--- reorders
fool
What do you know?
chicken crosses road
so long...
print_occurence_of_unique_strings_with_hash <--- reorders
so long...
What do you know?
chicken crosses road
fool
strings : 2.156
unique strings : 0.953
compared by address : 0.187
strings with hash : 2.078
unique strings with hash : 0.922
This seems to substantiate this post http://sourceforge.net/mailarchive/forum.php?thread_id=30201462&forum_id=46178.
3) python 4.2 (python test.py; interestingly this is faster than with -OO)
so long...
What do you know
fool
chicken crosses road
Elapsed: 2.278332 seconds
It seems the unification of the strings via hash is not significantly better (compared to memcpy or whatever is used).
The results for the dinkumware STL are dominated by the sheer number of calls to HeapAlloc/HeapFree and to EnterCriticalSection/LeaveCriticalSection (I compiled with multithreading and linked statically). The python version won't be concerned with the latter, I guess (although just by importing 'threading' performance gets about 3.5% worse).
STLport uses for example InterlockedExchange (obviously in the default allocator) and several orders of magnitude fewer calls to the heap allocation API.
It all boils down to the fact that I can write C++ or even assembler programs that perform worse (by any degree I want) than a functionally equivalent python program. That's a trivial insight and it certainly doesn't provide evidence that python 'is faster' than c++/assembler. Even if that's just because it *isn't* and can't be.
I always wonder where this rage comes from to prove oneself or whomever that python is the best programming language in any regard as if it hadn't enough other advantages. It is certainly not the fastest/most memory efficient tool out there. One of the biggest advantages of python to me is that, even if I run into a hotspot, the transition from python to C/C++ is so easy. That means that in spite of its warts (GIL, memory efficiency ...) I don't easily come to a dead end using python. This fact lets me use python's great strengths (like programmer efficiency etc.) without tossing and turning sleeplessly in my bed.
cheers,
aa
--
Andreas Ames | Programmer | Comergo GmbH |
Voice: +49 69 7505 3213 | ames AT avaya DOT com
Those of you experiencing a temporary obsession with this topic
are encouraged to study The Great Language Shootout, until the
obsession goes away. ;)
Your time might not be totally wasted; an opportunity to improve
the Python solutions may present itself.
http://shootout.alioth.debian.org/
--
Neil Cerutti
Strangely, in slow motion replay, the ball seemed to hang in the
air for even longer. --David Acfield
And which for Python has been further optimized in obmalloc. Again, Python
wins. ;-)
Skip
> Those of you experiencing a temporary obsession with this topic
> are encouraged to study The Great Language Shootout, until the
> obsession goes away. ;)
I think that everybody knows GLS. However, when I have results different
from what I expected, I try to understand where I did the wrong
assumption.
However, a recent post kind of explains what the problem is.
> Those of you experiencing a temporary obsession with this topic
> are encouraged to study The Great Language Shootout, until the
> obsession goes away. ;)
I think that everybody knows GLS. However, when I have results different
from what I expected, I try to understand where I did the wrong
assumption.
But a recent post kind of explains what the problem is.
I think you probably want this C++ code instead:
//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;
int main(){
vector<string> a;
a.reserve( 40000 ); //<------------------ note this change
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
}
It will run a lot faster if it doesn't have to keep resizing the array.
I don't see why it should run a lot faster that way.
Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000 strings
saves (probably) math.log(40000, 2) reallocations of the vector's
storage along with the consequent copy-construction and
destruction of some fixed number of strings. Most of those
reallocations take place while the vector is still proportionally
quite small.
--
Neil Cerutti
Math.log(40000, 2) is not a small number when talking about a
relatively expensive operation such as memory allocation and
deallocation. And the superfluous copying amounts to probably an extra
2^16 copies in this case - not exactly trivial.
--
Ben Sizer
I see your logic, however it does make it run a lot faster. On my
machine, with 1000000 repetition, what used to take 5+ seconds now
takes only about 1+++ to 2 secs.
>
> --
> Neil Cerutti
Actually, I misread the suggestion.
I thought I saw:
vector<string> v;
v.reserve(40000);
instead of:
vector<string> v(40000);
The latter actually *slows* the program, according to my tests. I
think the poster meant to suggest the former, and that's what I
was discussing.
I increased the number of iterations from 10000 to 100000 for the
tests I'm reporting.
The best of five, with reserve, was 2363 clock ticks. Without
reserve, it was 3244. The suggestion to use an initial size
resulted in 3655 clocks.
So it appears you were more right (using reserve sped up the
program by 35%, just about as the numbers predict), although we
were both wrong. ;)
--
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper
My PC is: 3.6GH Pentium4
My OS is: Ubuntu 6.06 i386
My lisp implementation is SBCL 0.9.8 for i386
My python's version is: V2.4.3
Both implementations were installed via apt-get install.
Here's my Lisp program:
++++++++++++++++++++++++++++++++++++++++++++++++++
;;; from slow to fast implementation
(proclaim '(optimize (speed 3) (debug 0) (safety 0) (space 0)))
(defun test1 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'string-equal))
(map 'vector #'print b)))
(defun test2 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'eq))
(map 'vector #'print b)))
(defun test3 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil
:fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push "What do you know" a)
(vector-push "so long ..." a)
(vector-push "chicken crosses road" a)
(vector-push "fool" a)))
(setf b (remove-duplicates a))
(map 'vector #'print b)))
(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
1. When using string equal comparator (test #1), the speed slows down
dramatically. 1.93 -> 9.35
2. It seems that append(vector-push) in lisp costs much more than
direct access via index.
Here's my benchmark result:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
#1
(time (test1))
"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
9.346 seconds of real time
9.020564 seconds of user run time
0.328021 seconds of system run time
0 page faults and
457,565,688 bytes consed.
#2
(time (test2))
"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
1.931 seconds of real time
1.884118 seconds of user run time
0.048003 seconds of system run time
0 page faults and
49,561,312 bytes consed.
#3
(time (test3))
"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
1.721 seconds of real time
1.704107 seconds of user run time
0.016001 seconds of system run time
0 page faults and
32,012,040 bytes consed.
#4
(time (test4))
"What do you know"
"so long ..."
"chicken crosses road"
"fool"
Evaluation took:
0.843 seconds of real time
0.824051 seconds of user run time
0.020001 seconds of system run time
0 page faults and
32,012,040 bytes consed.
+++++++++++++++++++++++++++++++++++++++++++++++++++
Here's my python's code:
ef f():
a = []
for i in range(1000000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s
import time
from time import clock
f_start = clock()
f()
f_end = clock()
print "Elapsed: %f seconds" % (f_end - f_start)
And benchmark result:
python -O test.py
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.260000 seconds
Oops, Python beat native-compiled lisp code.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Any way,
MY CONCLUSION is: C++ is not the right model for symbolic computation.
That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):
def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s
But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.
I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):
import psyco
psyco.bind(f2)
It makes that f2 more than twice faster here.
Bye,
bearophile
I heard that python's list is implemented as adjustable array.
Here's my lisp implementation:
++++++++++++++++++
(defun test-list ()
(let ((a nil)
(b nil))
(dotimes (i 1000000)
(progn
(push "What do you know" a)
(push "so long ..." a)
(push "chicken crosses road" a)
(push "fool" a)))
(setf b (remove-duplicates a))
(map 'list #'print b)))
+++++++++++++++++++++++++
And the benchmark result:
(time (test-list))
"fool"
"chicken crosses road"
"so long ..."
"What do you know"
Evaluation took:
2.88 seconds of real time
2.744172 seconds of user run time
0.136009 seconds of system run time
0 page faults and
74,540,392 bytes consed.
++++++++++++++++++++++++++++++
BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to
me thousands of lines of errors and warnings.
Your python's example (use direct index array index) of my
corresponding lisp code works slower than the version which use
'append'.
This let me think how python's list is implemented.
Anyway, python's list is surprisingly efficient.
What do you know
fool
chicken crosses road
f elapsed: 1.260000 seconds
f2 elapsed 2.110000 seconds
>I heard that python's list is implemented as adjustable array.
Correct, an array geometrically adjustable on the right.
>Here's my lisp implementation:<
What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).
>BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to me thousands of lines of errors and warnings.<
Find a Win box ;-) It's already compiled for it (for Py 2.3, 2.4).
>Your python's example (use direct index array index) of my corresponding lisp code works slower than the version which use 'append'.<
For me (a slow PC) it's almost twice faster, computer life is usually
complex.
For me using the esplicit allocation + Psyco makes that program about 4
times faster (from 8 to 2 seconds).
>This let me think how python's list is implemented.<
You also have to think how the * allocation is implemented and many
other things :-)
The list implementation is rather readable, Python sources are online
too.
>Anyway, python's list is surprisingly efficient.<
But its access isn't that fast :-) Psyco helps.
Bye,
bearophile
Here's the memory usage:
1) before the loop ( fully garbage collected)
29,052,560 bytes, 757,774 objects.
2) after the loop
103,631,952 bytes, 8,760,495 objects.
It seems A has consumed 74M bytes, 8bytes each cell. That make sense
because a cell in list consists of 2 pointers, (car cdr), and an mem
address is 32 bit.
Don't worry.
For me using Py 2.4.3 those memory values are 4536 before and 20184 kb
after, it means a difference of 15648 kb, that equals to about 16023552
bytes, that equals to about 1000000 * 4 * 4. That means 4 bytes for
each string reference into the array. If you don't use Psyco the
starting memory value is quite lower.
If you don't use Psyco but you use the append, the values are 1384 and
18880, this means about 4.47 bytes / value, because python lists grow
geometrically, so there is some wasted space.
I find the memory used with PsList called from the Python script:
http://www.sysinternals.com/Utilities/PsList.html
Bye,
bearophile
Here's my code:
++++++++++++++++++++++++++++++++++++++++
require 'time'
def f
a = []
1000000.times do
a.push "What do you know"
a.push "so long ..."
a.push "chicken crosses road"
a.push "fool"
end
b = a.uniq
b.each do |x|
puts x
end
end
def f2
a = Array.new(4000000)
1000000.times do |i|
a[i] = "What do you know"
a[i+1] = "so long ..."
a[i+2] = "chicken crosses road"
a[i+3] = "fool"
end
b = a.uniq
b.each do |x|
puts x
end
end
f_start = Time.now
f
f_end = Time.now
f2_start = Time.now
f2
f2_end = Time.now
puts "f: Elapsed time: #{f_end - f_start} sec."
puts "f2: Elapsed time: #{f2_end - f_start} sec."
++++++++++++++++++++++++++++++++++++++++++
And the benchmark result:
What do you know
so long ...
chicken crosses road
fool
What do you know
so long ...
chicken crosses road
fool
nil
f: Elapsed time: 3.600294 sec.
f2: Elapsed time: 11.182927 sec.
++++++++++++++++++++++++++++++++++++++++++
I like ruby because its purity. :p
Licheng Fang wrote:
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>
> //C++
> #include <iostream>
> #include <string>
> #include <vector>
> #include <set>
> #include <algorithm>
> using namespace std;
>
> int main(){
> vector<string> a;
> for (long int i=0; i<10000 ; ++i){
> a.push_back("What do you know?");
> a.push_back("so long...");
> a.push_back("chicken crosses road");
> a.push_back("fool");
> }
> set<string> b(a.begin(), a.end());
> unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> }
>
> #python
> def f():
> a = []
> for i in range(10000):
> a.append('What do you know')
> a.append('so long...')
> a.append('chicken crosses road')
> a.append('fool')
> b = set(a)
> for s in b:
> print s
>
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?
using namespace std;
int main()
{
DWORD ticks = ::GetTickCount();
const string s1("What do you know");
const string s2("So long...");
const string s3("chicken crosses road");
const string s4("fool");
typedef vector<const string*> str_vector_t;
typedef str_vector_t::iterator vec_iter;
typedef set<const string*> str_set_t;
typedef str_set_t::const_iterator set_iter;
const int size = 1000000;
str_vector_t vec;
vec.reserve(size*4);
for(int i = 0; i < size; ++i){
vec.push_back(&s1);
vec.push_back(&s2);
vec.push_back(&s3);
vec.push_back(&s4);
}
vec_iter new_end = unique(vec.begin(), vec.end());
str_set_t set_(vec.begin(), new_end);
for(set_iter it = set_.begin(); it != set_.end(); ++it){
cout<<*it<<endl;
}
cout<<::GetTickCount()-ticks<<endl;
return 0;
}
In MS VC+ 2005 in release configuration it gets the work done in 187
ms.
---------------- Python + Psyco----------------
def f():
a = []
for i in range(1000000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s
import time
from time import clock
import psyco
psyco.full()
f_start = clock()
f()
f_end = clock()
print "Elapsed: %f seconds" % (f_end - f_start)
In Python in manages to do the same in 1.8 secs (psyco boosts it to
0.7; see below)
so long...
What do you know
fool
chicken crosses road
Elapsed: 0.772899 seconds
Well, that's how it is in my book.
Regards,
Andrei