Python and STL efficiency

Licheng Fang

unread,

Aug 21, 2006, 2:52:16 AM8/21/06

to

Hi, I'm learning STL and I wrote some simple code to compare the
efficiency of python and STL.

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<string> a;
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

#python
def f():
a = []
for i in range(10000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

I was using VC++.net and IDLE, respectively. I had expected C++ to be
way faster. However, while the python code gave the result almost
instantly, the C++ code took several seconds to run! Can somebody
explain this to me? Or is there something wrong with my code?

Marc 'BlackJack' Rintsch

unread,

Aug 21, 2006, 3:48:27 AM8/21/06

to

In <1156143136.0...@i42g2000cwa.googlegroups.com>, Licheng Fang
wrote:

> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>
> //C++
> #include <iostream>
> #include <string>
> #include <vector>
> #include <set>
> #include <algorithm>
> using namespace std;
>
> int main(){
> vector<string> a;
> for (long int i=0; i<10000 ; ++i){
> a.push_back("What do you know?");
> a.push_back("so long...");
> a.push_back("chicken crosses road");
> a.push_back("fool");
> }
> set<string> b(a.begin(), a.end());
> unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> }

Why are you using `unique_copy` here?

> #python
> def f():
> a = []
> for i in range(10000):
> a.append('What do you know')
> a.append('so long...')
> a.append('chicken crosses road')
> a.append('fool')
> b = set(a)
> for s in b:
> print s
>
> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

There's a difference in data structures at least. The Python `set` type
is implemented with a hash algorithm, so the equivalent STL type would be
`hash_set`. `set` in Python does not store its contents sorted.

Ciao,
Marc 'BlackJack' Rintsch

Licheng Fang

unread,

Aug 21, 2006, 4:09:45 AM8/21/06

to

Marc 'BlackJack' Rintsch wrote:
> In <1156143136.0...@i42g2000cwa.googlegroups.com>, Licheng Fang
> wrote:
>
> > Hi, I'm learning STL and I wrote some simple code to compare the
> > efficiency of python and STL.
> >
> > //C++
> > #include <iostream>
> > #include <string>
> > #include <vector>
> > #include <set>
> > #include <algorithm>
> > using namespace std;
> >
> > int main(){
> > vector<string> a;
> > for (long int i=0; i<10000 ; ++i){
> > a.push_back("What do you know?");
> > a.push_back("so long...");
> > a.push_back("chicken crosses road");
> > a.push_back("fool");
> > }
> > set<string> b(a.begin(), a.end());
> > unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> > }
>
> Why are you using `unique_copy` here?

Sorry, that's a typo. Actually I used 'copy'.

>
> > #python
> > def f():
> > a = []
> > for i in range(10000):
> > a.append('What do you know')
> > a.append('so long...')
> > a.append('chicken crosses road')
> > a.append('fool')
> > b = set(a)
> > for s in b:
> > print s
> >
> > I was using VC++.net and IDLE, respectively. I had expected C++ to be
> > way faster. However, while the python code gave the result almost
> > instantly, the C++ code took several seconds to run! Can somebody
> > explain this to me? Or is there something wrong with my code?
>
> There's a difference in data structures at least. The Python `set` type
> is implemented with a hash algorithm, so the equivalent STL type would be
> `hash_set`. `set` in Python does not store its contents sorted.
>
> Ciao,
> Marc 'BlackJack' Rintsch

Thank you for your comments. I tested with hash_set, but I didn't see
much performance improvement. When I increased the loop to 1 million
times, the python code still ran reasonably fast and the C++ code got
stuck there. This totally surprised me, because according this page
http://norvig.com/python-lisp.html, the speed of python is nowhere near
that of C++.

Tim N. van der Leeuw

unread,

Aug 21, 2006, 4:24:14 AM8/21/06

to

Licheng Fang wrote:
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>
>

> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

Hi,

I'm no C++ guru so cannot comment on the C++ code itself, however I do
wonder if you tested your C++ code with other STL implementation such
as gcc (gcc is available on windows as well, in various versions).

What could be is that expanding the list in C++ is done in very small
increments, leading to many re-allocations. Is it possible to
pre-allocate the vector<> with sufficient entries?

Also, your Python code as quoted, doesn't actually call your function
f(). If you say that you get results instantly, I assume that you mean
all 4 strings are actually printed to console?

(I'm surprised that the console prints things that fast).

btw, using range() in Python isn't very efficient, I think... Better to
use xrange().

Asked a C++ collegue of mine to comment, and he strongly suspects that
you're actually running it in the .Net runtime (your C++ code contains
some C#-isms, such as omitting the '.h' in the include <> statements).

Luck,

--Tim

Tim N. van der Leeuw

unread,

Aug 21, 2006, 4:28:07 AM8/21/06

to

Marc 'BlackJack' Rintsch wrote:
> In <1156143136.0...@i42g2000cwa.googlegroups.com>, Licheng Fang
> wrote:
>
> > Hi, I'm learning STL and I wrote some simple code to compare the
> > efficiency of python and STL.
> >

[...]

>
> There's a difference in data structures at least. The Python `set` type
> is implemented with a hash algorithm, so the equivalent STL type would be
> `hash_set`. `set` in Python does not store its contents sorted.
>

The set should be only 4 items in size, according to my reading of the
code, so set implementation differences shouldn't lead to drastic
performance differences.

> Ciao,
> Marc 'BlackJack' Rintsch

Cheers,

--Tim

Marc 'BlackJack' Rintsch

unread,

Aug 21, 2006, 4:45:10 AM8/21/06

to

In <1156148654....@p79g2000cwp.googlegroups.com>, Tim N. van der
Leeuw wrote:

> (your C++ code contains some C#-isms, such as omitting the '.h' in the
> include <> statements).

That's no C#-ism, that's C++. The standard C++ header names don't have a
trailing '.h'. ``gcc`` prints deprecation warnings if you write the
names with '.h'.

Ciao,
Marc 'BlackJack' Rintsch

Tim N. van der Leeuw

unread,

Aug 21, 2006, 4:50:23 AM8/21/06

to

We stand corrected.

--Tim

Fredrik Lundh

unread,

Aug 21, 2006, 5:36:37 AM8/21/06

to pytho...@python.org

Licheng Fang wrote:

> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

in the Python example, the four strings in your example are shared, so
you're basically copying 40000 pointers to the list.

in the C++ example, you're creating 40000 string objects.

</F>

Ray

unread,

Aug 21, 2006, 5:38:49 AM8/21/06

to

How did you compile the C++ executable? I assume that it is Release
mode? Then are the optimization switches enabled? Is it compiled as
Native Win32 or Managed application?

I suspect that other than what other posters have suggested about your
code, the difference in speed is due to the way you build your C++
executable...

HTH,
Ray

Peter Otten

unread,

Aug 21, 2006, 5:37:39 AM8/21/06

to

Licheng Fang wrote:

> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.

> I was using VC++.net and IDLE, respectively. I had expected C++ to be

> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

Just a guess: immutable strings might be Python's advantage. Due to your
"benchmark"'s simplicity you end up with 10000 string instances in C++ and
just four str-s (and a lot of pointers) in Python.

What happens if you replace 'string' with 'const char *' in C++ ?
(Note that this modification is a bit unfair to Python as it would not
detect equal strings in different memory locations)

Peter

Ray

unread,

Aug 21, 2006, 8:10:37 AM8/21/06

to

Fredrik Lundh wrote:
> in the Python example, the four strings in your example are shared, so
> you're basically copying 40000 pointers to the list.
>
> in the C++ example, you're creating 40000 string objects.
>
> </F>

In which case, Licheng, you should try using the /GF switch. This will
tell Microsoft C++ compiler to pool identical string literals together.

:)

Fredrik Lundh

unread,

Aug 21, 2006, 8:34:03 AM8/21/06

to pytho...@python.org

Ray wrote:

>> in the C++ example, you're creating 40000 string objects.
>

> In which case, Licheng, you should try using the /GF switch. This will
> tell Microsoft C++ compiler to pool identical string literals together.

in what way does that change the implementation of C++'s string type ?

</F>

Tim N. van der Leeuw

unread,

Aug 21, 2006, 8:58:21 AM8/21/06

to

The code still creates a new string - instance each time it tries to
append a const char* to the vector<string> ...

You should instead create the string-objects ahead of time, outside of
the loop.

Regards,

--Tim

Jeremy Sanders

unread,

Aug 21, 2006, 9:09:18 AM8/21/06

to

Licheng Fang wrote:

> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

It must be the debugging, the compiler or a poor STL implementation. With
gcc 4 it runs instantly on my computer (using -O2), even with 10x the
number of values.

If the problem is that C++ has to make lots of new strings, as other posters
have suggested, then you could do something like

const string foo = "What do you know?";

for (long int i=0; i<10000 ; ++i){

a.push_back(foo);
...
}

as many C++ implementations use reference counting for identical strings.

Jeremy

--
Jeremy Sanders
http://www.jeremysanders.net/

Christophe

unread,

Aug 21, 2006, 9:16:10 AM8/21/06

to

Jeremy Sanders a écrit :

As a matter of fact, do not count on that. Use a vector<string*> just in
case.

Tim N. van der Leeuw

unread,

Aug 21, 2006, 9:16:52 AM8/21/06

to

Tim N. van der Leeuw wrote:

Alternatively, slow down the Python implementation by making Python
allocate new strings each time round:

a.append('%s' % 'What do you know')

... for each of your string-appends. But even then, the python-code is
still near-instant.

Cheers,

--Tim

Ray

unread,

Aug 22, 2006, 4:37:10 AM8/22/06

to

Ah, yes what was I thinking? The fact that it stores std::string
objects escaped my mind somehow. /GF just pools the string literals.
Thanks for the correction.

>
> </F>

Ray

unread,

Aug 22, 2006, 4:37:56 AM8/22/06

to

Tim N. van der Leeuw wrote:

> > In which case, Licheng, you should try using the /GF switch. This will
> > tell Microsoft C++ compiler to pool identical string literals together.
> >
> >
> > :)
>
> The code still creates a new string - instance each time it tries to
> append a const char* to the vector<string> ...

Yeah, you're right... I've been programming Java too long :)

Tim N. van der Leeuw

unread,

Aug 22, 2006, 5:26:05 AM8/22/06

to

Ray wrote:
> Tim N. van der Leeuw wrote:
> > > In which case, Licheng, you should try using the /GF switch. This will
> > > tell Microsoft C++ compiler to pool identical string literals together.
> > >
> > >
> > > :)
> >
> > The code still creates a new string - instance each time it tries to
> > append a const char* to the vector<string> ...
>
> Yeah, you're right... I've been programming Java too long :)
>

Took me a while to see that too! Have been programming too much Java /
Python as well. Anyways, when changing the Python version so that it
adds 40.000 unique strings to the list (and proving that there are
indeed 40.000 unique ids in the list, by making a set of all id()s in
the list and taking the len() of that set), it still takes at most a
second. I cannot test the speed of the c++ version on my computer, so
nothing scientific here.

I'm curious though, if on the OP's machine the slowed-down Python
version is still faster than the C++ version.

Cheers,

--Tim

Mc Osten

unread,

Aug 22, 2006, 6:55:35 AM8/22/06

to

Jeremy Sanders <jeremy+com...@jeremysanders.net> wrote:

> It must be the debugging, the compiler or a poor STL implementation. With
> gcc 4 it runs instantly on my computer (using -O2), even with 10x the
> number of values.

$ gcc --version
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5363)

I adapted original poster's code and made a function that did not create
strings each time. The NoisyString is a class we can use to actually
track copying.

In fact Python here is faster. Suppose it has a really optimized set
class...

Here some results (I know that the fpoint optimizations are useless...
it's is my "prebuilt" full optimization macro :) ):

$ g++ -O3 -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71

$ g++ -Os -pipe -O2 -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.71

$ g++ -O3 -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.47
Elapsed 0.18

$ g++ -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 0.63
Elapsed 0.33

$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds
Elapsed: 3.810000 seconds

------------------- PYTHON CODE ---------------------------------
#python

global size
size = 1000000

def f():
a = []
for i in range(size):

a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

def slow_f():
a = []
for i in range(size):
a.append('%s' % 'What do you know')
a.append('%s' % 'so long...')
a.append('%s' % 'chicken crosses road')
a.append('%s' % 'fool')

b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

slow_f_start = clock()
slow_f()
slow_f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)
print "Elapsed: %f seconds" % (slow_f_end - slow_f_start)

------------------------------------------------------------------

----------------- CPP CODE -------------------------------------
#include <iostream>
#include <ostream>
#include <iterator>

#include <string>
#include <vector>
#include <set>
#include <algorithm>

#include <ctime>
using namespace std;

#define SIZE 1000000

class NoisyString : public std::string {
public:
NoisyString(const string& cp)
: string(cp)
{
cout << "Fuck I got copied!" << endl;
}

NoisyString(const char* s ) : string(s) {

}

};

void f(){
vector<string> a;
for (long int i=0; i<SIZE ; ++i){

a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());

copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

void fast_f(){
vector<string> a;
string s1 = "What do you know?" ;
string s2 = "so long..." ;
string s3 = "chicken crosses road";
string s4 = "fool" ;
for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<string> b(a.begin(), a.end());

copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}

int main(){
clock_t f_start,
f_end,
faster_f_start,
faster_f_end,
fast_f_start,
fast_f_end;

f_start = clock();
f();
f_end = clock();

fast_f_start = clock();
fast_f();
fast_f_end = clock();

cout << "Elapsed " << (f_end - f_start) / double(CLOCKS_PER_SEC) <<
endl;
cout << "Elapsed " << (fast_f_end - fast_f_start) /
double(CLOCKS_PER_SEC) << endl;

}

-----------------------------------------------------------------------

--
blog: http://www.akropolix.net/rik0/blogs | Uccidete i filosofi,
site: http://www.akropolix.net/rik0/ | tenetevi riso e
forum: http://www.akropolix.net/forum/ | bacchette per voi.

Mc Osten

unread,

Aug 22, 2006, 6:55:36 AM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> I'm curious though, if on the OP's machine the slowed-down Python
> version is still faster than the C++ version.

I tested both on my machine (my other post in the thread)

Fredrik Lundh

unread,

Aug 22, 2006, 7:52:35 AM8/22/06

to pytho...@python.org

"Mc Osten" wrote:

> In fact Python here is faster. Suppose it has a really optimized set
> class...

Python's memory allocator is also quite fast, compared to most generic
allocators...

</F>

Mc Osten

unread,

Aug 22, 2006, 8:32:44 AM8/22/06

to

Fredrik Lundh <fre...@pythonware.com> wrote:

> Python's memory allocator is also quite fast, compared to most generic
> allocators...

In fact also in the two "slow" versions Python outperforms C++.
I didn't notice it in the first place.

Tim N. van der Leeuw

unread,

Aug 22, 2006, 8:39:59 AM8/22/06

to

Mc Osten wrote:
> Fredrik Lundh <fre...@pythonware.com> wrote:
>
> > Python's memory allocator is also quite fast, compared to most generic
> > allocators...
>
> In fact also in the two "slow" versions Python outperforms C++.
> I didn't notice it in the first place.
>

But your C++ program outputs times in seconds, right? So all
compilations except for the first two give results in less than a
second, right? (meaning the optimizations of your standard-compilation
give worst results than -O3?)

BTW, I don't quite understand your gcc optimizations for the first 2
compiles anyways: two -O options with different values. Doesn't that
mean the 2nd -O takes preference, and the compilation is at -O2 instead
of -O3?

Why both -O3 and -O2 at the command-line?

Cheers,

--Tim

Tim N. van der Leeuw

unread,

Aug 22, 2006, 9:09:53 AM8/22/06

to

Well, I guess I'm getting really obsessed with this. But anyways. I
installed MinGW on my Windows-XP (sp2) laptop. It is g++ version 3.4.5
-- ancient, yes, but on windows it's the latest available.

I compiled Mc Osten's C++ program (tweaked the output a little) and ran
it; ran his version of the python code too.
Oh boy; yes indeed the slow python is faster than the fast C++
version... Must be something really awful happening in the STL
implementation that comes with GCC 3.4!

Here's the output from my console:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ g++ -O3 -march=pentium-m -o SpeedTest SpeedTest.cpp

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.py
Begin Test
Number of unique string objects: 4

so long...
What do you know
fool
chicken crosses road

Number of unique string objects: 40000

so long...
What do you know
fool
chicken crosses road

Fast - Elapsed: 0.037574 seconds
Slow - Elapsed: 0.081520 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Fast - Elapsed: 2.089 seconds
Slow - Elapsed: 6.303 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python

Cheers,

--Tim

Jeremy Sanders

unread,

Aug 22, 2006, 9:50:41 AM8/22/06

to

Mc Osten wrote:

> Here some results (I know that the fpoint optimizations are useless...
> it's is my "prebuilt" full optimization macro :) ):

Interesting. The opimisation makes no difference to the speed of the C++ one
for me. I just get

xpc17:~> g++4 -O2 test2.cpp
xpc17:~> ./a.out

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Elapsed 2.11
Elapsed 1.11

(This is with an Althon 64 4600+ running Linux).

Unfortunately the Python on this computer doesn't have set as it is too old,
so I can't compare it.

Mc Osten

unread,

Aug 22, 2006, 11:32:46 AM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> But your C++ program outputs times in seconds, right? So all
> compilations except for the first two give results in less than a
> second, right? (meaning the optimizations of your standard-compilation
> give worst results than -O3?)

Yes. It's in seconds but the benchmark that are one order of magnitudo
less than the others have of a different "size" (100000 instead of
1000000). That is cut and paste from my terminal... I think it's a
mess. I do it all again from scratch.

> BTW, I don't quite understand your gcc optimizations for the first 2
> compiles anyways: two -O options with different values. Doesn't that
> mean the 2nd -O takes preference, and the compilation is at -O2 instead
> of -O3?
> Why both -O3 and -O2 at the command-line?

I forgot I put -O2 in my $FAST_FLAGS. I don't know what I was thinking
about.

This the correct version

$ g++ -Os -pipe -march=pentium-m -msse3 -fomit-frame-pointer
-mfpmath=sse -o set_impl set_impl.cpp

$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Elapsed 6.3
Elapsed 2.1

$ g++ -O2 -pipe -march=pentium-m -msse3 -fomit-frame-pointer

-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...
Elapsed 5.8
Elapsed 1.7

$ g++ -O3 -pipe -march=pentium-m -msse3 -fomit-frame-pointer

-mfpmath=sse -o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Elapsed 5.79
Elapsed 1.72

$ g++ -pipe -march=pentium-m -msse3 -fomit-frame-pointer -mfpmath=sse

-o set_impl set_impl.cpp
$ ./set_impl
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Elapsed 7.12
Elapsed 2.98

$ python -O set_impl.py
so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road
Elapsed: 1.370000 seconds

Elapsed: 3.800000 seconds

Mc Osten

unread,

Aug 22, 2006, 11:32:47 AM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> Oh boy; yes indeed the slow python is faster than the fast C++
> version... Must be something really awful happening in the STL
> implementation that comes with GCC 3.4!

And the Python version does the very same number of iterations than the
C++ one? I suppose they are looping on arrays of different sizes, just
like my "first version".

Tim N. van der Leeuw

unread,

Aug 22, 2006, 12:57:28 PM8/22/06

to

Mc Osten wrote:
> Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:
>
> > Oh boy; yes indeed the slow python is faster than the fast C++
> > version... Must be something really awful happening in the STL
> > implementation that comes with GCC 3.4!
>
> And the Python version does the very same number of iterations than the
> C++ one? I suppose they are looping on arrays of different sizes, just
> like my "first version".
>

Hmmm.. You're quite right. The C++ version had an array size 100.000
(your version), the Python version still had an array size 10.000 (as
in my modified copy of the original version).

When fixing the Python version to have 100.000 items, like the C++
version, the Python timings are:

Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road

Number of unique string objects: 400000

so long...
What do you know
fool
chicken crosses road

Fast - Elapsed: 0.512088 seconds
Slow - Elapsed: 1.139370 seconds

Still twice as fast as the fastest GCC 3.4.5 compiled version!

Incidentally, I also have a version compiled with VC++ 6 now... (not
yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
for speed, here's the result of VC++ 6:

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe

Begin Test
What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Fast - Elapsed: 4.481 seconds
Slow - Elapsed: 4.842 seconds

So you can see that it's 'slow' version of the code is faster than the
'slow' version compiled with GCC, but the 'fast' code is barely faster
than the 'slow' code! And the 'fast' version compiled with GCC is much
faster than the 'fast' version compiled with VC++ 6!

My conclusion from that is, that the vector<> or set<> implementations
of GCC are far superior to those of VC++ 6, but that memory allocation
for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
(And Python still smokes them both).

Cheers,

--Tim

Tim N. van der Leeuw

unread,

Aug 22, 2006, 4:23:50 PM8/22/06

to

Tim N. van der Leeuw wrote:
> Mc Osten wrote:
> > Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:
> >
> > > Oh boy; yes indeed the slow python is faster than the fast C++
> > > version... Must be something really awful happening in the STL
> > > implementation that comes with GCC 3.4!
> >
> > And the Python version does the very same number of iterations than the
> > C++ one? I suppose they are looping on arrays of different sizes, just
> > like my "first version".
> >
>
> Hmmm.. You're quite right. The C++ version had an array size 100.000
> (your version), the Python version still had an array size 10.000 (as
> in my modified copy of the original version).
>
> When fixing the Python version to have 100.000 items, like the C++
> version, the Python timings are:
>

[...]

> Fast - Elapsed: 0.512088 seconds
> Slow - Elapsed: 1.139370 seconds
>
> Still twice as fast as the fastest GCC 3.4.5 compiled version!
>
> Incidentally, I also have a version compiled with VC++ 6 now... (not
> yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
> for speed, here's the result of VC++ 6:
>
> LeeuwT@nlshl-leeuwt ~/My Documents/Python
> $ ./SpeedTest_VC.exe

[...]

> Fast - Elapsed: 4.481 seconds
> Slow - Elapsed: 4.842 seconds
>

[...]

And the results of IronPython (1.0rc2) are just in as well:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>>
>>> import sys
>>> sys.path.append('c:/documents and settings/leeuwt/my documents/python')
>>> import SpeedTest
>>> SpeedTest.run_test()

Begin Test
Number of unique string objects: 4

What do you know
so long...
chicken crosses road
fool

Number of unique string objects: 400000

What do you know
so long...
chicken crosses road
fool
Fast - Elapsed: 1.287923 seconds
Slow - Elapsed: 4.516272 seconds
>>>

And for Python 2.5:
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py

Begin Test
Number of unique string objects: 4
so long...
What do you know
fool
chicken crosses road
Number of unique string objects: 400000
so long...
What do you know
fool
chicken crosses road

Fast - Elapsed: 0.440619 seconds
Slow - Elapsed: 1.095341 seconds

LeeuwT@nlshl-leeuwt ~/My Documents/Python

But beware! For Python2.5 I had to change the code slightly, because it
already realized that the expression

'%s' % 'something'

will be a constant expression, and evaluates it once only... so I had
to replace '%s' with a variable, and I got the timings above which show
Python2.5 to be slightly faster than Python2.4.

(Next step would be to create a VB version and a Java version of the
same program, oh and perhaps to try a version that would work with
Jython... perhaps somehow w/o the 'set')

Cheers,

--Tim

sk...@pobox.com

unread,

Aug 22, 2006, 4:40:54 PM8/22/06

to Tim N. van der Leeuw, pytho...@python.org

Tim> But beware! For Python2.5 I had to change the code slightly,
Tim> because it already realized that the expression

Tim> '%s' % 'something'

Tim> will be a constant expression, and evaluates it once only... so I
Tim> had to replace '%s' with a variable, and I got the timings above
Tim> which show Python2.5 to be slightly faster than Python2.4.

Shouldn't you then get rid of any compiler optimizations your C++ compiler
does? Why penalize 2.5 because it recognizes a useful optimization?

Tim> (Next step would be to create a VB version and a Java version of
Tim> the same program, oh and perhaps to try a version that would work
Tim> with Jython... perhaps somehow w/o the 'set')

I don't recall the example exactly, but couldn't you just create a set class
that uses a dict under the covers and only implement the methods you need
for the test?

Skip

Maric Michaud

unread,

Aug 22, 2006, 4:49:34 PM8/22/06

to pytho...@python.org

Le mardi 22 août 2006 12:55, Mc Osten a écrit :
> In fact Python here is faster. Suppose it has a really optimized set
> class...

Maybe I'm missing something but the posted c++codes are not equivalent IMO to
what python is doing. I discarded the "slow" version, and tried to get the
equivalent in c++ of :

"""
#!/usr/bin/env python

size = 1000000

def f():
a = []
for i in range(size):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)
"""

I came at first with the following, which is still slower than the python
version :

"""
void print_occurence_of_unique_strings(){
vector<string> a;
const string& s1 = "What do you know?" ;
const string& s2 = "so long..." ;
const string& s3 = "chicken crosses road";
const string& s4 = "fool" ;

for (long int i=0; i<SIZE ; ++i){
a.push_back(s1);
a.push_back(s2);
a.push_back(s3);
a.push_back(s4);
}
set<string> b(a.begin(), a.end());
copy(b.begin(), b.end(),
ostream_iterator<string>(cout, "\n"));
}
"""

Here, all strings, while passed by reference to the vector, are copied one by
one.
Then, I tried this, it just overcome the performance of python code, but not
in the proportion I expected :

"""
void print_occurence_of_unique_strings_compare_by_adresses(){
vector<string*> a;

string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){

a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<string> b;
for (vector<string*>::iterator it=a.begin(); it!=a.end(); it++)
b.insert(**it);

copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
}
"""

The problem here, is that the strings in the set are compared by value, which
is not optimal, and I guess python compare them by adress ("s*n is s*n" has
the same complexity than "s*n == s*n" in CPython, right ?).

so, finally, the code in c++, about ten times faster than the equivalent in
python, must be :

"""
void print_occurence_of_unique_strings_compared_by_address(){
cout << "print_occurence_of_unique_strings_compared_by_address" << endl;
vector<string*> a;

string s1 = "What do you know?";
string s2 = "so long...";
string s3 = "chicken crosses road";
string s4 = "fool";
for (long int i=0; i<SIZE ; ++i){

a.push_back(&s1);
a.push_back(&s2);
a.push_back(&s3);
a.push_back(&s4);
}
set<string*> b(a.begin(), a.end());
set<string> c; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));
}
"""

the result on my box is :

maric@redflag2 mar aoû 22 22:24:23:~$ g++ -O3 -o testcpp testcpp.cpp
maric@redflag2 mar aoû 22 22:24:29:~$ ./testcpp
print_occurence_of_strings

What do you know?
chicken crosses road
fool
so long...

print_occurence_of_unique_strings

What do you know?
chicken crosses road
fool
so long...

print_occurence_of_unique_strings_compared_by_address

What do you know?
chicken crosses road
fool
so long...

strings : 1.89
unique strings : 0.87
compared by address : 0.18
maric@redflag2 mar aoû 22 22:24:38:~$ python2.4 testpython.py

so long...
What do you know
fool
chicken crosses road

Elapsed: 1.680000 seconds
maric@redflag2 mar aoû 22 22:24:51:~$ g++ -v
Using built-in specs.
Target: i486-linux-gnu
Configured
with: ../src/configure -v --enable-languages=c,c++,java,fortran,objc,obj-c++,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.1-1.4.2.0/jre --enable-mpfr --with-tune=i686 --enable-checking=release
i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20060613 (prerelease) (Debian 4.1.1-5)

I've joined the full c++ file as an attachment.

--
_____________

Maric Michaud
_____________

Aristote - www.aristote.info
3 place des tapis
69004 Lyon
Tel: +33 426 880 097

testcpp.cpp

Tim N. van der Leeuw

unread,

Aug 22, 2006, 5:02:31 PM8/22/06

to

sk...@pobox.com wrote:
> Tim> But beware! For Python2.5 I had to change the code slightly,
> Tim> because it already realized that the expression
>
> Tim> '%s' % 'something'
>
> Tim> will be a constant expression, and evaluates it once only... so I
> Tim> had to replace '%s' with a variable, and I got the timings above
> Tim> which show Python2.5 to be slightly faster than Python2.4.
>
> Shouldn't you then get rid of any compiler optimizations your C++ compiler
> does? Why penalize 2.5 because it recognizes a useful optimization?
>

The point is that I was trying to create 400.000 string instances. The
extra optimization in 2.5 required an extra trick for that.
The idea is to compare a C++ version which creates 400.000 string
instances, with a Python version which creates 400.000 string
instances; then reduce those 400.000 instances to a set of only 4
unique strings.
(So I cannot just create a list with strings generated from numbers 1 -
400.000, and I didn't want to change the original code too much, so I
just added a trick to make Python allocate a new string each time
round.)

I agree that Python2.5 recognized a useful optimization, and didn't
wish to penalize it for that, however the optimalization was defeating
the purpose of my code in the first place!

Cheers,

--Tim

Fredrik Lundh

unread,

Aug 22, 2006, 5:15:27 PM8/22/06

to pytho...@python.org

Maric Michaud wrote:

> The problem here, is that the strings in the set are compared by value, which
> is not optimal, and I guess python compare them by adress ("s*n is s*n" has
> the same complexity than "s*n == s*n" in CPython, right ?).

wrong.

> timeit -s"s='x'; n=1000" "s*n is n*s"
1000000 loops, best of 3: 1.9 usec per loop

> timeit -s"s='x'; n=1000" "s*n == n*s"
100000 loops, best of 3: 4.5 usec per loop

</F>

Tim N. van der Leeuw

unread,

Aug 22, 2006, 5:16:58 PM8/22/06

to

Maric Michaud wrote:
> Le mardi 22 août 2006 12:55, Mc Osten a écrit :
> > In fact Python here is faster. Suppose it has a really optimized set
> > class...
>
> Maybe I'm missing something but the posted c++codes are not equivalent IMO to
> what python is doing. I discarded the "slow" version, and tried to get the
> equivalent in c++ of :
>

Your C++ version got me the following timings (using gcc 3.4.5 as the
compiler, MinGW version, with -O6):

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...

strings : 2.135
unique strings : 1.103
compared by address : 0.21

For reference, Python's best time was 0.39 seconds on the same computer
(in the 'fast' version, using only 4 unique string instances).

Hmmm... Can we conclude now that carefully crafted C++ code is about
twice as fast as casually and intuitively written Python code? ;) (Just
kidding here of course)

NB: Your code now tests for address-equality. Does it also still test
for string-equality? It looks to me that it does, but it's not quite
clear to me.

Cheers,

--Tim

Mc Osten

unread,

Aug 22, 2006, 7:31:01 PM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> NB: Your code now tests for address-equality. Does it also still test
> for string-equality? It looks to me that it does, but it's not quite
> clear to me.

It does it.

set<string*> b(a.begin(), a.end());
set<string> c; // well ordered set (b is ordered by address)
for (set<string*>::iterator it=b.begin(); it!=b.end(); it++)
c.insert(**it);
copy(c.begin(), c.end(), ostream_iterator<string>(cout, "\n"));

When we populate the first set, we get rid of all strings with same
object id/address (it test equality of pointers). Then we populate
another set (and the default equality test is on strings).

However, I would have written the code using a proper compare function
rather than using two sets. In this particular case the number of
elements of the first set is negligible in respect of the initial vector
size, thus copying it again does not take a lot of time.
But such code is optimized for the problem itself: in the real world I
suppose we would have passed set a proper comparison function that
checks address and then string equality.

Mc Osten

unread,

Aug 22, 2006, 7:31:02 PM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> My conclusion from that is, that the vector<> or set<> implementations
> of GCC are far superior to those of VC++ 6, but that memory allocation
> for GCC 3.4.5 (MinGW version) is far worse than that of MSCRT / VC++ 6.
> (And Python still smokes them both).

It would be interesting to test it with VC 8 (2005). I have it in my
Parallels vm, but it looks like something is wrong. The very same code
takes almost a minute, I suppose there is something wrong with it
(Python is almost as fast as the python 2.4 on MacOS).

Mc Osten

unread,

Aug 22, 2006, 7:31:02 PM8/22/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> And the results of IronPython (1.0rc2) are just in as well:

I can't test this one.

>
> And for Python 2.5:
> LeeuwT@nlshl-leeuwt ~/My Documents/Python
> $ /cygdrive/c/Python25/python.exe SpeedTest.py
> Begin Test
> Number of unique string objects: 4
> so long...
> What do you know
> fool
> chicken crosses road
> Number of unique string objects: 400000
> so long...
> What do you know
> fool
> chicken crosses road
> Fast - Elapsed: 0.440619 seconds
> Slow - Elapsed: 1.095341 seconds

What the heck... you have a Cray, haven't you?
$ /opt/misc/bin/python2.5 -O set_impl.py

so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road

Elapsed: 1.300000 seconds
Elapsed: 1.290000 seconds

Yes... good optimizer work. The 'slow' code here is faster than the fast
one.

$ python -O set_impl.py

so long...
What do you know
fool
chicken crosses road
so long...
What do you know
fool
chicken crosses road

Elapsed: 1.360000 seconds
Elapsed: 3.800000 seconds

> (Next step would be to create a VB version and a Java version of the
> same program, oh and perhaps to try a version that would work with
> Jython... perhaps somehow w/o the 'set')

Ok. I can do the Java version. If I find a RealBasic Set class I can do
it. However, I don't remember anything about VB6, and have done nothing
with .Net.
But I don't think it is that interesting. Java strings are immutable
too: I expect it to outperform Python (unless Java Set class sucks). And
I don't see the point of taking in VB.
A good BASIC implentation is comparable with Pascal or C++ speedwise.
(At least this results from Great Language Shootout and Free Basic).

Ray

unread,

Aug 22, 2006, 7:50:07 PM8/22/06

to

Tim N. van der Leeuw wrote:
> Incidentally, I also have a version compiled with VC++ 6 now... (not
> yet w/VC++ 7) .. Compiled with release-flags and maximum optimization
> for speed, here's the result of VC++ 6:

<snip>

OK, now I'm getting obsessed with this too ;-)

I'm using VC++ Express, I didn't care to tweak the optimizations, I
merely chose the "Release" configuration for the executable. It's
blazing fast, taking only 30+ ms each run.

Here's the code:

int main(){
DWORD begin = ::GetTickCount();
vector<string> a;
string c = "What do you know?";
string d = "so long...";
string e = "chicken crosses road";
string f = "fool";
for (long int i=0; i<10000 ; ++i){
a.push_back(c);
a.push_back(d);
a.push_back(e);
a.push_back(f);
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}

And here's the result:

\TestSTL\release>TestSTL.exe

What do you know?
chicken crosses road
fool
so long...

Ends in 31 ms.

I tried the original version:

int main(){
DWORD begin = ::GetTickCount();
vector<string> a;
for (long int i=0; i<10000 ; ++i){

a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());

unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
DWORD end = ::GetTickCount();
cout << "Ends in " << (end - begin) << " ms.";
}

And the result is only 50% slower:

\TestSTL\release>TestSTL.exe

What do you know?
chicken crosses road
fool
so long...

Ends in 47 ms.

coul...@gmail.com

unread,

Aug 22, 2006, 10:16:29 PM8/22/06

to

That's to say,
python is still much faster?

I am a c++ newbie but I think c++ should be faster here.
Maybe someone can post this to the c++ maillist and they will tell how
to accelerate it.

Tim N. van der Leeuw wrote:

Ray

unread,

Aug 22, 2006, 10:37:43 PM8/22/06

to

coul...@gmail.com wrote:
> That's to say,
> python is still much faster?

Not really, see my test, in my other post in the same thread. I'm using
VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
fair that for C++ we're using the latest as well.

> I am a c++ newbie but I think c++ should be faster here.

Same here, although that said Python's implementation of those data
structure must already be as optimal as mortals can do it.

<snip>

Mc Osten

unread,

Aug 23, 2006, 2:52:52 AM8/23/06

to

Ray <ray_u...@yahoo.com> wrote:

> I'm using VC++ Express, I didn't care to tweak the optimizations, I
> merely chose the "Release" configuration for the executable. It's
> blazing fast, taking only 30+ ms each run.

Of course it is faster. We are looping 1000000 times, you just 10000.

Mc Osten

unread,

Aug 23, 2006, 2:52:53 AM8/23/06

to

Ray <ray_u...@yahoo.com> wrote:

> Not really, see my test, in my other post in the same thread. I'm using
> VC++ Express 2005. If we're comparing with Python 2.5 I think it's just
> fair that for C++ we're using the latest as well.

In your test, you are looping 10000 times, we looped 1000000.
In Python tests with 10000 elements, it was about 10 ms.

Moreover, we tried various Python and C++ configurations. Most of the
tests are done with Python 2.4, not 2.5.
And I used gcc4, that is to say the latest on my platform.

> Same here, although that said Python's implementation of those data
> structure must already be as optimal as mortals can do it.

I think this is the rationale behind it.

Mc Osten

unread,

Aug 23, 2006, 2:52:53 AM8/23/06

to

<coul...@gmail.com> wrote:

> That's to say,
> python is still much faster?

Yes it is. But of course you can't sat that "Python is faster than C++".
We found that the code to do this, written in the most natural way, is a
lot faster in Python. However, if you optimze the code, C++ gets almost
as fast.

In other benchmarks C++ outperforms Python and is 10 or 100 times
faster.

> Maybe someone can post this to the c++ maillist and they will tell how
> to accelerate it.

There are enough C++ experts here to do it. The point is another.

Ray

unread,

Aug 23, 2006, 3:15:16 AM8/23/06

to

Mc Osten wrote:
> Ray <ray_u...@yahoo.com> wrote:
>
> > I'm using VC++ Express, I didn't care to tweak the optimizations, I
> > merely chose the "Release" configuration for the executable. It's
> > blazing fast, taking only 30+ ms each run.
>
> Of course it is faster. We are looping 1000000 times, you just 10000.

Certainly--I was not comparing 1000000 against 10000. Referring to the
OP's statement: "However, while the python code gave the result almost
instantly, the C++ code took several seconds to run!" 30ms sounds like
a definite improvement over several seconds!

I'll try to tweak it later at home and report here. I'll try out the
1000000 too.

Cheers
Ray

Ray

unread,

Aug 23, 2006, 3:17:47 AM8/23/06

to

Mc Osten wrote:
> In your test, you are looping 10000 times, we looped 1000000.
> In Python tests with 10000 elements, it was about 10 ms.
>
> Moreover, we tried various Python and C++ configurations. Most of the
> tests are done with Python 2.4, not 2.5.
> And I used gcc4, that is to say the latest on my platform.

Mine's VC 2005 Express--let me put the optimization parameters later
and measure again when I get home.

<snip>

GHUM

unread,

Aug 23, 2006, 3:51:24 AM8/23/06

to

Mc Osten schrieb:

> Yes it is. But of course you can't sat that "Python is faster than C++".

Of course not. Python is faster then assembler. Proofed @ EuroPython
2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
gave their presentations before.

Harald

Mc Osten

unread,

Aug 23, 2006, 3:59:13 AM8/23/06

to

Ray <ray_u...@yahoo.com> wrote:

> Certainly--I was not comparing 1000000 against 10000. Referring to the
> OP's statement: "However, while the python code gave the result almost
> instantly, the C++ code took several seconds to run!" 30ms sounds like
> a definite improvement over several seconds!

Of course. I suppose there's something broken in OP's C++ setup (in fact
the version I compiled with VCPP 2005 also takes a lot of seconds...
something like 20-30 seconds, but of course this makes me think I
haven't understood how it is supposed to work, since my gcc gives
results comparable to yours).

Mc Osten

unread,

Aug 23, 2006, 3:59:14 AM8/23/06

to

GHUM <haraldar...@gmail.com> wrote:

> Proofed @ EuroPython
> 2006 in CERN, near the LHC Beta, in the same room many Nobel laurates
> gave their presentations before.

Have you some link? I suppose it's kind of a joke they did or something
like that...

Ray

unread,

Aug 23, 2006, 4:13:12 AM8/23/06

to

Mc Osten wrote:
> Of course. I suppose there's something broken in OP's C++ setup (in fact
> the version I compiled with VCPP 2005 also takes a lot of seconds...
> something like 20-30 seconds, but of course this makes me think I
> haven't understood how it is supposed to work, since my gcc gives
> results comparable to yours).

Yeah, my guess would be either he used the Debug configuration or he
actually created a Managed executable instead of a pure Win32
application. Sigh, now I can't wait to get home and try it out :)

Maric Michaud

unread,

Aug 23, 2006, 4:24:48 AM8/23/06

to pytho...@python.org

Le mardi 22 août 2006 23:15, Fredrik Lundh a écrit :
> Maric Michaud wrote:
> > The problem here, is that the strings in the set are compared by value,
> > which is not optimal, and I guess python compare them by adress ("s*n is
> > s*n" has the same complexity than "s*n == s*n" in CPython, right ?).
>
> wrong.
>

Ah ! wrong, thanks, but not in our case :

In [80]: for i in (10**e for e in range(5)) :
....: res1 = timeit.Timer('s == t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: res2 = timeit.Timer('s is t', 's, t = "e"*%i, "e"*%i'%(i,
i)).timeit()
....: print res1/res2
....:
....:
1.10532866525
1.27507328965
1.90244004672
8.33974283485
89.5215441627

Indeed, it's wrong for two instances of str, but not if we compare an instance
to itself :

In [79]: for i in (10**e for e in range(9)) :
....: r1=timeit.Timer('s is p', "s, p = ('s'*%s,)*2" % i).timeit()
....: r2=timeit.Timer('s == p', "s, p = ('s'*%s,)*2" % i).timeit()
....: print r1/r2
....:
....:
0.854690643008
0.872682262181
0.851785060822
0.871193603744
0.890304121256
0.86925960859
0.846364097331
0.91614070798
0.825424114324

So my lastest c++ algorithm is the good one still, as the python code is
like :

In [29]: timeit.Timer('set(t)', 't=("e"*10,)*10').timeit()
Out[29]: 1.883868932723999

In [30]: timeit.Timer('set(t)', 't=("e"*100,)*10').timeit()
Out[30]: 1.8789899349212646

and not :

In [27]: timeit.Timer('set(t)', 't=["e"*10 for i in range(10)]').timeit()
Out[27]: 2.6000990867614746

In [28]: timeit.Timer('set(t)', 't=["e"*100 for i in range(10)]').timeit()
Out[28]: 4.1173238754272461

> > timeit -s"s='x'; n=1000" "s*n is n*s"
>
> 1000000 loops, best of 3: 1.9 usec per loop
>
> > timeit -s"s='x'; n=1000" "s*n == n*s"
>
> 100000 loops, best of 3: 4.5 usec per loop
>
> </F>

--

Maric Michaud

unread,

Aug 23, 2006, 5:27:29 AM8/23/06

to pytho...@python.org

Tim, sorry for I send it to you personally, I was abused by thunderbird.

Tim N. van der Leeuw a écrit :

>
> Your C++ version got me the following timings (using gcc 3.4.5 as the
> compiler, MinGW version, with -O6):
>

> ...

>
>
> Hmmm... Can we conclude now that carefully crafted C++ code is about
> twice as fast as casually and intuitively written Python code? ;) (Just
> kidding here of course)

Every C++ code must be carefully crafted :).

But your benchmark surprise me, here is what I get on my windows box :

Python2.4 :
===========

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python2.4
/usr/bin/python2.4

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ which python
/cygdrive/c/Python24/python

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau

$ python2.4 testpython.py
so long...
What do you know
fool
chicken crosses road

Elapsed: 3.655000 seconds

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ python testpython.py

so long...
What do you know
fool
chicken crosses road

Elapsed: 3.077764 seconds

Cygwin version is slower, but not too much.

C++, compiled with VS2005, release, no other configuration :
============================================================

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...

strings : 16.718 # memory allocation is quite bad
unique strings : 1.188
compared by address : 0.453

C++, with gcc 3.3/Cygwin
========================

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau

$ g++ -O3 -o testcpp testcpp.cpp

maric@aristote-2 /cygdrive/c/Documents and Settings/maric/Bureau
$ ./testcpp

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...

strings : 17.266 # still bad
unique strings : 1.547
compared by address : 0.375

Hum, with my old gcc I get equal or better performances than with VS2005.

Finally, the equivalent code is still about 10x faster in c++ than in
Python, as it was on my Linux box.

Mc Osten a écrit :
...

> However, I would have written the code using a proper compare function
> rather than using two sets. In this particular case the number of
> elements of the first set is negligible in respect of the initial vector
> size, thus copying it again does not take a lot of time.
> But such code is optimized for the problem itself: in the real world I

Of course, it's a quick hack to get the same behavior.

> suppose we would have passed set a proper comparison function that
> checks address and then string equality.
>

Yes, furthermore, that is *exactly* what the python code is doing (see
my other post to Fredrik).

bearoph...@lycos.com

unread,

Aug 23, 2006, 8:24:32 AM8/23/06

to

This thread can be useful for ShedSkin (the Python => C++ translator),
because often it manages strings slower than CPython still, some
suggestions from a C++ expert can surely improve things a lot. C++ is
fast, but you have to use and know it well, otherwise you don't obtain
much speed.

Maybe this can be useful to speed up C++ hashing (now used in ShedSkin
too):
http://www.azillionmonkeys.com/qed/hash.html

I suggest to test this code with the D language too, it has built-in
dicts too (associative arrays, impleented with trees and not hashes),
so the source code needed is pretty short.

Bye,
bearophile

Mc Osten

unread,

Aug 23, 2006, 8:47:47 AM8/23/06

to

Ray <ray_u...@yahoo.com> wrote:

> Yeah, my guess would be either he used the Debug configuration or he
> actually created a Managed executable instead of a pure Win32
> application. Sigh, now I can't wait to get home and try it out :)

Can be. But I suppose a Managed should not get *that* slow.
IronPython on Tim's machine is still faster than C++ (even though not as
fast as CPython).

Tim N. van der Leeuw

unread,

Aug 23, 2006, 9:36:57 AM8/23/06

to

I have to admit to a stupid mistake, for which I feel quite ashamed - I
got the loop-size wrong in the Python code. So all Python results
posted by me were off by a factor of 10 :-(
I feel quite bad about that!

With the nr of loops corrected, Python on my laptop performs worse than
C++ under all circumstances, by a factor of about 2:

============ Python 2.4 =============
LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python24/python.exe SpeedTest.py

Begin Test
Number of unique string objects: 4

so long...
What do you know
fool
chicken crosses road

Number of unique string objects: 4000000

so long...
What do you know
fool
chicken crosses road

Fast - Elapsed: 4.239721 seconds
Slow - Elapsed: 11.883234 seconds

============ Python 2.5 =============

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ /cygdrive/c/Python25/python.exe SpeedTest.py

Begin Test
Number of unique string objects: 4

so long...
What do you know
fool
chicken crosses road

Number of unique string objects: 4000000

so long...
What do you know
fool
chicken crosses road

Fast - Elapsed: 4.031873 seconds
Slow - Elapsed: 11.314742 seconds

============ GCC 3.4.5, MinGW, -O6 =============

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest.exe
Begin Test

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Fast - Elapsed: 2.088 seconds
Slow - Elapsed: 7.033 seconds

============ VC++ 6, 'release' build =============

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./SpeedTest_VC.exe

Begin Test

What do you know?
chicken crosses road
fool
so long...
What do you know?
chicken crosses road
fool
so long...

Fast - Elapsed: 4.585 seconds
Slow - Elapsed: 5.024 seconds

========== GCC 3.4.5, MinGW, -O6, with most optimized C++ code
==========

LeeuwT@nlshl-leeuwt ~/My Documents/Python
$ ./testcpp.exe
print_occurence_of_strings

What do you know?
chicken crosses road
fool
so long...

print_occurence_of_unique_strings

What do you know?
chicken crosses road
fool
so long...

print_occurence_of_unique_strings_compared_by_address

What do you know?
chicken crosses road
fool
so long...

strings : 2.338
unique strings : 1.109
compared by address : 0.23

LeeuwT@nlshl-leeuwt ~/My Documents/Python

============ IronPython 1.0rc2 =============

IronPython had a hard time coping with it; creating 4 million string
objects is a bit too much and the CLR was eating nearly a gigabyte of
memory near the end.
Here are the numbers:

IronPython 1.0.60816 on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>> import sys

>>> sys.path.append('C:/Documents and Settings/LeeuwT/My Documents/Python')
>>> import SpeedTest
>>> SpeedTest.run_test()

Begin Test
Number of unique string objects: 4

What do you know
so long...
chicken crosses road
fool

Number of unique string objects: 4000000

What do you know
so long...
chicken crosses road
fool

Fast - Elapsed: 10.501273 seconds
Slow - Elapsed: 371.047343 seconds
>>>

============ Java 1.6.0 b2 =============
Set size: 4
chicken crosses road

What do you know
fool

so long...
Set size: 4
chicken crosses road

What do you know
fool

so long...
Fast - Elapsed 1.003 seconds
Slow - Elapsed 3.96 seconds

============ Java 1.5.0 =============
Set size: 4
fool

What do you know
so long...
chicken crosses road

Set size: 4
fool

What do you know
so long...

chicken crosses road
Fast - Elapsed 1.754 seconds
Slow - Elapsed 5.044 seconds
=========================

Note that the Python code creates a set of all unique id's of all
objects in list a, and prints the length of this set, to verify that
all strings are really unique instances or duplicate instances. The C++
versions don't do that (at least not for 4 million strings); so Python
is at a slight disadvantage here. Printing the number of strings still
didn't help me catch the off-by-ten errors though.

I included a Java version of the program, and it looks like it performs
quite well compared to C++ both with jdk1.5 and jdk1.6.

I humbly apologize for my misinformation yesterday.

Regards,

--Tim

Ray

unread,

Aug 23, 2006, 10:27:40 AM8/23/06

to

Tim N. van der Leeuw wrote:

> With the nr of loops corrected, Python on my laptop performs worse than
> C++ under all circumstances, by a factor of about 2:

*Phew*

Great to know that my model of how the world works is still correct!
(at least in relation to Python and C++!) :)

Thanks,
Ray

sk...@pobox.com

unread,

Aug 23, 2006, 12:20:40 PM8/23/06

to Ray, pytho...@python.org

Ray> Same here, although that said Python's implementation of those data
Ray> structure must already be as optimal as mortals can do it.

Perhaps more optimal. We've had (tim)bots working on the problem for years.

Skip

sk...@pobox.com

unread,

Aug 23, 2006, 12:42:04 PM8/23/06

to GHUM, pytho...@python.org

>> Yes it is. But of course you can't sat that "Python is faster than
>> C++".

Harald> Of course not. Python is faster then assembler. Proofed @
Harald> EuroPython 2006 in CERN, near the LHC Beta, in the same room
Harald> many Nobel laurates gave their presentations before.

Harald, do you have a reference to a talk abstract or a paper?

Thx,

Skip

Paul Boddie

unread,

Aug 23, 2006, 1:01:32 PM8/23/06

to

Via the lightning talks page:

http://wiki.python.org/moin/EuroPy2006LightningTalks

Here's a direct link:

http://wiki.python.org/moin/EuroPy2006LightningTalks?action=AttachFile&do=get&target=%3AFasterThenAssemblerLightning.odp

Of course, Harald's interpretation of the "faster than" qualification
is broader than a pure assessment of raw performance, so I wouldn't
expect too many technical revelations. And while footage exists of at
least one talk in the CERN Auditorium which led to a Nobel prize, I
don't think that any footage of the EuroPython lightning talks (or of
any of the other talks) has been released just yet.

Paul

Amanjit Gill

unread,

Aug 23, 2006, 4:45:53 PM8/23/06

to

> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost

- This code runs amortized 100ms on my machien (vc.net 2003 pro,
dinkumware stl, p4m 2.2GHz thinkpad, windows 2003 server), (10 loops in
1000ms)
- with STLPort 5.2, this code runs in 39-43ms (10 loops in 390ms ->
430ms).

a. did you compile in release mode? and if yes, vc.net _standard_
edition has no optimizing compiler in release mode, you need vc.net pro
or enterprise. or did you use vc.net 2005 ?

you should also include <algorithm> for ostream_operator.

Amanjit Gill

unread,

Aug 23, 2006, 5:18:24 PM8/23/06

to

> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.

Hi, you are benching heap allocations and of course heap fragmentation.
this is what devpartner c++ profiler had to say:

Method %in % with Called Average Real Name Method Children
-----------------------------------------------------------------
RtlFreeHeap 52,8 52,8 81.034 7,6 7,7
RtlAllocateHeap 19,8 19,8 81.089 2,9 2,9
main 16,9 89,7 1 198083,2 1052376,0
ExitProcess 10,3 10,3 1 120616,8 120641,3
...

So on linux its a lot better than that, because - I think - ptmalloc is
used which is based on dmalloc which is efficient for those small size
allocs (which malloc() and free() were never designed for).

on win32, activate the low fragmenting heap or write a custom stl
allocator.

_set_sbh_threshold(128); // method 1

ULONG HeapFragValue = 2;
HeapSetInformation((HANDLE)_get_heap_handle(), // method 2
HeapCompatibilityInformation,
&HeapFragValue,
sizeof(HeapFragValue));

last non-python message from me :)

Mc Osten

unread,

Aug 23, 2006, 5:24:10 PM8/23/06

to

Tim N. van der Leeuw <tim.lee...@nl.unisys.com> wrote:

> I have to admit to a stupid mistake, for which I feel quite ashamed - I
> got the loop-size wrong in the Python code. So all Python results
> posted by me were off by a factor of 10 :-(
> I feel quite bad about that!

Well, this makes *my* results quite surprising.
I checked it threetimes. I loop 1000000 times in both Python and C++,
and Python here is faster.

Mc Osten

unread,

Aug 23, 2006, 5:24:11 PM8/23/06

to

Ray <ray_u...@yahoo.com> wrote:

> Great to know that my model of how the world works is still correct!
> (at least in relation to Python and C++!) :)

So please explain my results. I loop the same number of times.

Neil Cerutti

unread,

Aug 23, 2006, 6:27:55 PM8/23/06

to

On 2006-08-23, Amanjit Gill <amanjit.s...@googlemail.com> wrote:
> you should also include <algorithm> for ostream_operator.

<ostream>, actually.

--
Neil Cerutti

Ames Andreas

unread,

Aug 24, 2006, 7:08:57 AM8/24/06

to pytho...@python.org

> -----Original Message-----
> From: python-list-bounces+andreas.ames=comer...@python.org
> [mailto:python-list-bounces+andreas.ames=comer...@python.or
> g] On Behalf Of Ray
> Sent: Wednesday, August 23, 2006 4:28 PM
> Subject: Re: Python and STL efficiency
>
>
> Tim N. van der Leeuw wrote:
> > With the nr of loops corrected, Python on my laptop
> performs worse than
> > C++ under all circumstances, by a factor of about 2:
>
> *Phew*
>

> Great to know that my model of how the world works is still correct!
> (at least in relation to Python and C++!) :)

If you're really eager to see python succeed with this 'test' or 'benchmark', if you will, you can look at the results with VC7.1. It clearly shows that much depends on the STL implementation, you use.

I use the implementations posted by Maric Michaud in http://mail.python.org/pipermail/python-list/2006-August/357593.html. The tests, for which I appended 'with hash', use a hash_set instead of a plain set.

1) vc71 with dinkumware STL (which is vc's default STL):

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...

print_occurence_of_strings_with_hash

What do you know?
chicken crosses road
fool
so long...

print_occurence_of_unique_strings_with_hash

What do you know?
chicken crosses road
fool
so long...

strings : 10.921 <---- Oops ...
unique strings : 1.047
compared by address : 0.328
strings with hash : 11.484 <---- Oooops ...
unique strings with hash : 1.016

2) vc7.1 with STLport 5.02:

print_occurence_of_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings
What do you know?
chicken crosses road
fool
so long...
print_occurence_of_unique_strings_compared_by_address
What do you know?
chicken crosses road
fool
so long...

print_occurence_of_strings_with_hash <--- reorders
fool

What do you know?
chicken crosses road

so long...
print_occurence_of_unique_strings_with_hash <--- reorders

so long...
What do you know?
chicken crosses road
fool

strings : 2.156
unique strings : 0.953
compared by address : 0.187
strings with hash : 2.078
unique strings with hash : 0.922

This seems to substantiate this post http://sourceforge.net/mailarchive/forum.php?thread_id=30201462&forum_id=46178.

3) python 4.2 (python test.py; interestingly this is faster than with -OO)

so long...
What do you know
fool
chicken crosses road

Elapsed: 2.278332 seconds

It seems the unification of the strings via hash is not significantly better (compared to memcpy or whatever is used).

The results for the dinkumware STL are dominated by the sheer number of calls to HeapAlloc/HeapFree and to EnterCriticalSection/LeaveCriticalSection (I compiled with multithreading and linked statically). The python version won't be concerned with the latter, I guess (although just by importing 'threading' performance gets about 3.5% worse).

STLport uses for example InterlockedExchange (obviously in the default allocator) and several orders of magnitude fewer calls to the heap allocation API.

It all boils down to the fact that I can write C++ or even assembler programs that perform worse (by any degree I want) than a functionally equivalent python program. That's a trivial insight and it certainly doesn't provide evidence that python 'is faster' than c++/assembler. Even if that's just because it *isn't* and can't be.

I always wonder where this rage comes from to prove oneself or whomever that python is the best programming language in any regard as if it hadn't enough other advantages. It is certainly not the fastest/most memory efficient tool out there. One of the biggest advantages of python to me is that, even if I run into a hotspot, the transition from python to C/C++ is so easy. That means that in spite of its warts (GIL, memory efficiency ...) I don't easily come to a dead end using python. This fact lets me use python's great strengths (like programmer efficiency etc.) without tossing and turning sleeplessly in my bed.

cheers,

aa

--
Andreas Ames | Programmer | Comergo GmbH |
Voice: +49 69 7505 3213 | ames AT avaya DOT com

Neil Cerutti

unread,

Aug 24, 2006, 8:39:46 AM8/24/06

to

On 2006-08-23, Mc Osten <ri...@despammed.com> wrote:
> Ray <ray_u...@yahoo.com> wrote:
>> Great to know that my model of how the world works is still
>> correct! (at least in relation to Python and C++!) :)
>
> So please explain my results. I loop the same number of times.

Those of you experiencing a temporary obsession with this topic
are encouraged to study The Great Language Shootout, until the
obsession goes away. ;)

Your time might not be totally wasted; an opportunity to improve
the Python solutions may present itself.

http://shootout.alioth.debian.org/

--
Neil Cerutti
Strangely, in slow motion replay, the ball seemed to hang in the
air for even longer. --David Acfield

sk...@pobox.com

unread,

Aug 24, 2006, 9:20:49 AM8/24/06

to Amanjit Gill, pytho...@python.org

Amanjit> So on linux its a lot better than that, because - I think -
Amanjit> ptmalloc is used which is based on dmalloc which is efficient
Amanjit> for those small size allocs (which malloc() and free() were
Amanjit> never designed for).

And which for Python has been further optimized in obmalloc. Again, Python
wins. ;-)

Skip

Mc Osten

unread,

Aug 24, 2006, 11:18:45 AM8/24/06

to

Neil Cerutti <hor...@yahoo.com> wrote:

> Those of you experiencing a temporary obsession with this topic
> are encouraged to study The Great Language Shootout, until the
> obsession goes away. ;)

I think that everybody knows GLS. However, when I have results different
from what I expected, I try to understand where I did the wrong
assumption.

However, a recent post kind of explains what the problem is.

Mc Osten

unread,

Aug 24, 2006, 11:20:17 AM8/24/06

to

Neil Cerutti <hor...@yahoo.com> wrote:

> Those of you experiencing a temporary obsession with this topic
> are encouraged to study The Great Language Shootout, until the
> obsession goes away. ;)

I think that everybody knows GLS. However, when I have results different

from what I expected, I try to understand where I did the wrong
assumption.

But a recent post kind of explains what the problem is.

JSpr...@gmail.com

unread,

Aug 24, 2006, 2:50:46 PM8/24/06

to

Licheng Fang wrote:
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>

> //C++
> #include <iostream>
> #include <string>
> #include <vector>
> #include <set>
> #include <algorithm>
> using namespace std;
>
> int main(){
> vector<string> a;
> for (long int i=0; i<10000 ; ++i){
> a.push_back("What do you know?");
> a.push_back("so long...");
> a.push_back("chicken crosses road");
> a.push_back("fool");
> }
> set<string> b(a.begin(), a.end());
> unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> }

I think you probably want this C++ code instead:

//C++
#include <iostream>
#include <string>
#include <vector>
#include <set>
#include <algorithm>
using namespace std;

int main(){
vector<string> a;
a.reserve( 40000 ); //<------------------ note this change
for (long int i=0; i<10000 ; ++i){
a.push_back("What do you know?");
a.push_back("so long...");
a.push_back("chicken crosses road");
a.push_back("fool");
}
set<string> b(a.begin(), a.end());
unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout,
"\n"));
}

It will run a lot faster if it doesn't have to keep resizing the array.

Neil Cerutti

unread,

Aug 24, 2006, 3:42:31 PM8/24/06

to

I don't see why it should run a lot faster that way.

Appending elements to a vector with push_back takes amortized
constant time. In the example above, preallocating 40000 strings
saves (probably) math.log(40000, 2) reallocations of the vector's
storage along with the consequent copy-construction and
destruction of some fixed number of strings. Most of those
reallocations take place while the vector is still proportionally
quite small.

--
Neil Cerutti

Ben Sizer

unread,

Aug 25, 2006, 6:11:46 AM8/25/06

to

Math.log(40000, 2) is not a small number when talking about a
relatively expensive operation such as memory allocation and
deallocation. And the superfluous copying amounts to probably an extra
2^16 copies in this case - not exactly trivial.

--
Ben Sizer

Ray

unread,

Aug 25, 2006, 10:26:28 AM8/25/06

to

Neil Cerutti wrote:
> I don't see why it should run a lot faster that way.
>
> Appending elements to a vector with push_back takes amortized
> constant time. In the example above, preallocating 40000 strings
> saves (probably) math.log(40000, 2) reallocations of the vector's
> storage along with the consequent copy-construction and
> destruction of some fixed number of strings. Most of those
> reallocations take place while the vector is still proportionally
> quite small.

I see your logic, however it does make it run a lot faster. On my
machine, with 1000000 repetition, what used to take 5+ seconds now
takes only about 1+++ to 2 secs.

>
> --
> Neil Cerutti

Neil Cerutti

unread,

Aug 25, 2006, 10:50:10 AM8/25/06

to

Actually, I misread the suggestion.

I thought I saw:

vector<string> v;
v.reserve(40000);

instead of:

vector<string> v(40000);

The latter actually *slows* the program, according to my tests. I
think the poster meant to suggest the former, and that's what I
was discussing.

I increased the number of iterations from 10000 to 100000 for the
tests I'm reporting.

The best of five, with reserve, was 2363 clock ticks. Without
reserve, it was 3244. The suggestion to use an initial size
resulted in 3655 clocks.

So it appears you were more right (using reserve sped up the
program by 35%, just about as the numbers predict), although we
were both wrong. ;)

--
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper

Pebblestone

unread,

Aug 25, 2006, 3:22:48 PM8/25/06

to

I tested in Common Lisp and compared the result with python.

My PC is: 3.6GH Pentium4
My OS is: Ubuntu 6.06 i386
My lisp implementation is SBCL 0.9.8 for i386
My python's version is: V2.4.3
Both implementations were installed via apt-get install.

Here's my Lisp program:

++++++++++++++++++++++++++++++++++++++++++++++++++
;;; from slow to fast implementation

(proclaim '(optimize (speed 3) (debug 0) (safety 0) (space 0)))

(defun test1 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'string-equal))
(map 'vector #'print b)))

(defun test2 ()
(let ((a (make-array 0 :adjustable t :fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push-extend "What do you know" a)
(vector-push-extend "so long ..." a)
(vector-push-extend "chicken crosses road" a)
(vector-push-extend "fool" a)))
(setf b (remove-duplicates a :test #'eq))
(map 'vector #'print b)))

(defun test3 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil
:fill-pointer 0))
(b nil))
(dotimes (i 1000000)
(progn
(vector-push "What do you know" a)
(vector-push "so long ..." a)
(vector-push "chicken crosses road" a)
(vector-push "fool" a)))
(setf b (remove-duplicates a))
(map 'vector #'print b)))

(defun test4 ()
(let ((a (make-array 4000000 :element-type 'string
:adjustable nil))
(b nil))
(dotimes (i 1000000)
(progn
(let ((j (1- (* 4 i))))
(setf (aref a (incf j)) "What do you know")
(setf (aref a (incf j)) "so long ...")
(setf (aref a (incf j)) "chicken crosses road")
(setf (aref a (incf j)) "fool"))))
(setf b (remove-duplicates a))
(map 'vector #'print b)))

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

1. When using string equal comparator (test #1), the speed slows down
dramatically. 1.93 -> 9.35
2. It seems that append(vector-push) in lisp costs much more than
direct access via index.

Here's my benchmark result:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
#1
(time (test1))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"

Evaluation took:
9.346 seconds of real time
9.020564 seconds of user run time
0.328021 seconds of system run time
0 page faults and
457,565,688 bytes consed.

#2
(time (test2))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"

Evaluation took:
1.931 seconds of real time
1.884118 seconds of user run time
0.048003 seconds of system run time
0 page faults and
49,561,312 bytes consed.

#3
(time (test3))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"

Evaluation took:
1.721 seconds of real time
1.704107 seconds of user run time
0.016001 seconds of system run time
0 page faults and
32,012,040 bytes consed.

#4
(time (test4))

"What do you know"
"so long ..."
"chicken crosses road"
"fool"

Evaluation took:
0.843 seconds of real time
0.824051 seconds of user run time
0.020001 seconds of system run time
0 page faults and
32,012,040 bytes consed.

+++++++++++++++++++++++++++++++++++++++++++++++++++

Here's my python's code:

ef f():
a = []
for i in range(1000000):
a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)

And benchmark result:

python -O test.py

so long...
What do you know
fool
chicken crosses road

Elapsed: 1.260000 seconds

Oops, Python beat native-compiled lisp code.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Any way,

MY CONCLUSION is: C++ is not the right model for symbolic computation.

bearoph...@lycos.com

unread,

Aug 25, 2006, 5:21:13 PM8/25/06

to

Pebblestone:

> (defun test4 ()
> (let ((a (make-array 4000000 :element-type 'string
> :adjustable nil))
> (b nil))
> (dotimes (i 1000000)
> (progn
> (let ((j (1- (* 4 i))))
> (setf (aref a (incf j)) "What do you know")
> (setf (aref a (incf j)) "so long ...")
> (setf (aref a (incf j)) "chicken crosses road")
> (setf (aref a (incf j)) "fool"))))
> (setf b (remove-duplicates a))
> (map 'vector #'print b)))

That test4 function can be compared to this one, with explicit
preallocation (and xrange instead of range!):

def f2():
n = 1000000
a = [None] * n * 4
for i in xrange(0, n*4, 4):
a[i] = 'What do you know'
a[i+1] = 'so long...'
a[i+2] = 'chicken crosses road'
a[i+3] = 'fool'
for s in set(a):
print s

But note this a list (that is an array, a list is a different data
structure) of python becomes filled with pointers. I don't know what
your CL does exactly.

I can also suggest you to use Psyco too here
(http://psyco.sourceforge.net/):

import psyco
psyco.bind(f2)

It makes that f2 more than twice faster here.

Bye,
bearophile

Pebblestone

unread,

Aug 25, 2006, 5:52:51 PM8/25/06

to

> But note this a list (that is an array, a list is a different data
> structure) of python becomes filled with pointers. I don't know what
> your CL does exactly.

I heard that python's list is implemented as adjustable array.

Here's my lisp implementation:

++++++++++++++++++
(defun test-list ()
(let ((a nil)

(b nil))
(dotimes (i 1000000)
(progn

(push "What do you know" a)
(push "so long ..." a)
(push "chicken crosses road" a)
(push "fool" a)))
(setf b (remove-duplicates a))
(map 'list #'print b)))
+++++++++++++++++++++++++

And the benchmark result:

(time (test-list))

"fool"
"chicken crosses road"
"so long ..."
"What do you know"
Evaluation took:
2.88 seconds of real time
2.744172 seconds of user run time
0.136009 seconds of system run time
0 page faults and
74,540,392 bytes consed.

++++++++++++++++++++++++++++++

BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to
me thousands of lines of errors and warnings.

Pebblestone

unread,

Aug 25, 2006, 5:55:21 PM8/25/06

to

Oh, I forgot.

Your python's example (use direct index array index) of my
corresponding lisp code works slower than the version which use
'append'.

This let me think how python's list is implemented.

Anyway, python's list is surprisingly efficient.

Pebblestone

unread,

Aug 25, 2006, 5:58:54 PM8/25/06

to

Here's the result:

What do you know
fool
chicken crosses road

f elapsed: 1.260000 seconds
f2 elapsed 2.110000 seconds

bearoph...@lycos.com

unread,

Aug 25, 2006, 6:19:42 PM8/25/06

to

Pebblestone:

>I heard that python's list is implemented as adjustable array.

Correct, an array geometrically adjustable on the right.

>Here's my lisp implementation:<

What's the memory size of a before computing b? You can compare it with
Python, that may need less memory (because the array contains
pointers).

>BTW, I couldn't install psyco on my system (ubuntu), gcc just prompt to me thousands of lines of errors and warnings.<

Find a Win box ;-) It's already compiled for it (for Py 2.3, 2.4).

>Your python's example (use direct index array index) of my corresponding lisp code works slower than the version which use 'append'.<

For me (a slow PC) it's almost twice faster, computer life is usually
complex.
For me using the esplicit allocation + Psyco makes that program about 4
times faster (from 8 to 2 seconds).

>This let me think how python's list is implemented.<

You also have to think how the * allocation is implemented and many
other things :-)
The list implementation is rather readable, Python sources are online
too.

>Anyway, python's list is surprisingly efficient.<

But its access isn't that fast :-) Psyco helps.

Bye,
bearophile

Pebblestone

unread,

Aug 25, 2006, 6:39:08 PM8/25/06

to

> What's the memory size of a before computing b? You can compare it with
> Python, that may need less memory (because the array contains
> pointers).

Here's the memory usage:

1) before the loop ( fully garbage collected)
29,052,560 bytes, 757,774 objects.

2) after the loop
103,631,952 bytes, 8,760,495 objects.

It seems A has consumed 74M bytes, 8bytes each cell. That make sense
because a cell in list consists of 2 pointers, (car cdr), and an mem
address is 32 bit.

Pebblestone

unread,

Aug 25, 2006, 6:54:40 PM8/25/06

to

Sorry, I did some miscalculation.... what a shame.....

bearoph...@lycos.com

unread,

Aug 25, 2006, 7:20:19 PM8/25/06

to

Pebblestone:

> Sorry, I did some miscalculation.... what a shame.....

Don't worry.
For me using Py 2.4.3 those memory values are 4536 before and 20184 kb
after, it means a difference of 15648 kb, that equals to about 16023552
bytes, that equals to about 1000000 * 4 * 4. That means 4 bytes for
each string reference into the array. If you don't use Psyco the
starting memory value is quite lower.

If you don't use Psyco but you use the append, the values are 1384 and
18880, this means about 4.47 bytes / value, because python lists grow
geometrically, so there is some wasted space.

I find the memory used with PsList called from the Python script:
http://www.sysinternals.com/Utilities/PsList.html

Bye,
bearophile

Pebblestone

unread,

Aug 25, 2006, 8:00:18 PM8/25/06

to

Ruby is also not far away :-)

Here's my code:

++++++++++++++++++++++++++++++++++++++++
require 'time'

def f
a = []
1000000.times do
a.push "What do you know"
a.push "so long ..."
a.push "chicken crosses road"
a.push "fool"
end
b = a.uniq
b.each do |x|
puts x
end
end

def f2
a = Array.new(4000000)
1000000.times do |i|

a[i] = "What do you know"
a[i+1] = "so long ..."

a[i+2] = "chicken crosses road"
a[i+3] = "fool"

end
b = a.uniq
b.each do |x|
puts x
end
end

f_start = Time.now
f
f_end = Time.now

f2_start = Time.now
f2
f2_end = Time.now

puts "f: Elapsed time: #{f_end - f_start} sec."
puts "f2: Elapsed time: #{f2_end - f_start} sec."

++++++++++++++++++++++++++++++++++++++++++

And the benchmark result:

What do you know

so long ...
chicken crosses road
fool
What do you know
so long ...
chicken crosses road
fool

nil
f: Elapsed time: 3.600294 sec.
f2: Elapsed time: 11.182927 sec.

++++++++++++++++++++++++++++++++++++++++++

I like ruby because its purity. :p

Licheng Fang wrote:
> Hi, I'm learning STL and I wrote some simple code to compare the
> efficiency of python and STL.
>
> //C++
> #include <iostream>
> #include <string>
> #include <vector>
> #include <set>
> #include <algorithm>
> using namespace std;
>
> int main(){
> vector<string> a;
> for (long int i=0; i<10000 ; ++i){
> a.push_back("What do you know?");
> a.push_back("so long...");
> a.push_back("chicken crosses road");
> a.push_back("fool");
> }
> set<string> b(a.begin(), a.end());
> unique_copy(b.begin(), b.end(), ostream_iterator<string>(cout, "\n"));
> }
>

> #python
> def f():
> a = []
> for i in range(10000):

> a.append('What do you know')
> a.append('so long...')
> a.append('chicken crosses road')
> a.append('fool')
> b = set(a)
> for s in b:
> print s
>

> I was using VC++.net and IDLE, respectively. I had expected C++ to be
> way faster. However, while the python code gave the result almost

> instantly, the C++ code took several seconds to run! Can somebody
> explain this to me? Or is there something wrong with my code?

andrei....@gmail.com

unread,

Aug 27, 2006, 6:15:42 AM8/27/06

to

---------------------------- C++ --------------------------
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <algorithm>
#include <windows.h>

using namespace std;

int main()
{
DWORD ticks = ::GetTickCount();

const string s1("What do you know");
const string s2("So long...");
const string s3("chicken crosses road");
const string s4("fool");

typedef vector<const string*> str_vector_t;
typedef str_vector_t::iterator vec_iter;
typedef set<const string*> str_set_t;
typedef str_set_t::const_iterator set_iter;

const int size = 1000000;
str_vector_t vec;
vec.reserve(size*4);

for(int i = 0; i < size; ++i){
vec.push_back(&s1);
vec.push_back(&s2);
vec.push_back(&s3);
vec.push_back(&s4);
}

vec_iter new_end = unique(vec.begin(), vec.end());
str_set_t set_(vec.begin(), new_end);

for(set_iter it = set_.begin(); it != set_.end(); ++it){
cout<<*it<<endl;
}
cout<<::GetTickCount()-ticks<<endl;

return 0;
}

In MS VC+ 2005 in release configuration it gets the work done in 187
ms.

---------------- Python + Psyco----------------
def f():
a = []
for i in range(1000000):

a.append('What do you know')
a.append('so long...')
a.append('chicken crosses road')
a.append('fool')
b = set(a)
for s in b:
print s

import time
from time import clock

import psyco
psyco.full()

f_start = clock()
f()
f_end = clock()

print "Elapsed: %f seconds" % (f_end - f_start)

In Python in manages to do the same in 1.8 secs (psyco boosts it to
0.7; see below)
so long...

What do you know
fool
chicken crosses road

Elapsed: 0.772899 seconds

Well, that's how it is in my book.

Regards,
Andrei