Container Performance

Bjarne Stroustrup

unread,

May 29, 2003, 2:29:36 PM5/29/03

to

/* Standard Container Benchmark

Version 0.9, May 23, 2003

The primary purpose of this benchmark is to show how different standard
containers are in terms of performance. We have observed that programmers
often use sets, multisets, deques in the situations where sorted vectors
are the optimal solutions. Similarly, programmers often choose lists simply
because they do a few insertions and deletions without knowing that vectors
are more efficient at that for small containers.

Frequently, of course, performance does not matter,
but it is important that programmers are aware of the consequences of their
choices. We are not saying that only vectors should be used, there are
cases when one really needs a more complicated data structure, but one needs to
understand the price for additional functionality.

The secondary purpose of the benchmark is to encourage compiler and library vendors to
keep improving performance. For example, it is not acceptable that some compilers give
you a sizable penalty for using vector iterators instead of pointer. It is also quite
clear that performance of other standard containers could be improved.

The benchmark is doing the same task 7 times using different data structures:
array, vector (using a pointer as iterator), vector (using the defailt cector iterator),
list, deque, set and multiset. The task is to remove duplicates from a sequence of doubles.
This is a simple test, but it illustrates the performance of containers.

It is clear that the benchmark needs to be eventually extended
to slists and even to hash-sets and hash-maps, but we decided that at present it
should work only with the standard containers. As the standard grows, so can
the benchmark. The additional benefit of not including hash based containers is
that now all the test have identical asymptotic complexity and, even more
importantly, do almost the same number of comparisons. The difference is only
in data structure overhead and cache behavior.

The number of times that a given test is run inversly proportional to NlogN where N is the
size of the sequence. This allows one to see how containers of different size behave.
The instructive values used for the benchmark are: 10, 100, 1000, 10000, 100000, 1000000.

The time taken for a run of the benchmark depends on the machine used, the compiler used,
the compiler and optimizer settings used, as well as the standard library. Please note that
the time taken could be several minutes - and much more if you use a slow debug mode.

The output is a table where columns are separated by tabs and rows by newlines. This is
barely ok for a human reader, but ideal for feeding into a program, such as a spreadsheet
for display or analysis.

If you want to add your own test of a container, add the name of your container to
the "header string, write a test function like the existing ones, e.g. vector_test,
and add a call of "run" for your test in "run_tests".

Alex Stepanov
step...@adobe.com

Bjarne Stroustrup
b...@cs.tamu.edu

*/

#include <stddef.h> // some older implementations lack <cstddef>
#include <time.h>
#include <math.h>
#include <stdlib.h>

#include <vector>
#include <algorithm>
#include <list>
#include <deque>
#include <set>

#include <iostream>
#include <iomanip>

typedef double element_t;

using namespace std;

vector<double> result_times; // results are puched into this vector

const char header[] =
"\tarray"
"\tvector with pointers"
"\tvector with iterators"
"\tdeque"
"\tlist"
"\tset"
"\tmultiset";

void do_head()
{
cout << "size" << header << '\n';
}

int do_tail()
{
// in case you want to stop for confirmation in a windows console application
//char c;
//cin >> c;
return 0;
}

void do_row(int size)
{
cout << size;
cout << fixed << setprecision(2);
for (size_t i = 0; i < result_times.size(); ++i)
cout << '\t' << result_times[i];
cout << '\n';
}

clock_t start_time;

inline void start_timer() { start_time = clock(); }

inline double timer()
{
clock_t end_time = clock();
return (end_time - start_time)/double(CLOCKS_PER_SEC);
}

typedef void(*Test)(element_t*, element_t*, int);
void run(Test f, element_t* first, element_t* last, int number_of_times)
{
start_timer();
while (number_of_times-- > 0) f(first,last,number_of_times);
result_times.push_back(timer());
}

void array_test(element_t* first, element_t* last, int number_of_times)
{
element_t* array = new element_t[last - first];
copy(first, last, array);
sort(array, array + (last - first));
unique(array, array + (last - first));
delete [] array;
}

void vector_pointer_test(element_t* first, element_t* last, int number_of_times)
{
vector<element_t> container(first, last);
// &*container.begin() gets us a pointer to the first element
sort(&*container.begin(), &*container.end());
unique(&*container.begin(), &*container.end());
}

void vector_iterator_test(element_t* first, element_t* last, int number_of_times)
{
vector<element_t> container(first, last);
sort(container.begin(), container.end());
unique(container.begin(), container.end());
}

void deque_test(element_t* first, element_t* last, int number_of_times)
{
// deque<element_t> container(first, last); CANNOT BE USED BECAUSE OF MVC++ 6
deque<element_t> container(size_t(last - first), 0.0);
copy(first, last, container.begin());
sort(container.begin(), container.end());
unique(container.begin(), container.end());
}

void list_test(element_t* first, element_t* last, int number_of_times)
{
list<element_t> container(first, last);
container.sort();
container.unique();
}

void set_test(element_t* first, element_t* last, int number_of_times)
{
set<element_t> container(first, last);
}

void multiset_test(element_t* first, element_t* last, int number_of_times)
{
multiset<element_t> container(first, last);
typedef multiset<element_t>::iterator iterator;
{
iterator first = container.begin();
iterator last = container.end();

while (first != last) {
iterator next = first;
if (++next == last) break;
if (*first == *next)
container.erase(next);
else
++first;
}
}
}

void initialize(element_t* first, element_t* last)
{
element_t value = 0.0;
while (first != last) {
*first++ = value;
value += 1.;
}
}

double logtwo(double x)
{
return log(x)/log((double) 2.0);
}

const int largest_size = 1000000;

int number_of_tests(int size) {
double n = size;
double largest_n = largest_size;
return int(floor((largest_n * logtwo(largest_n)) / (n * logtwo(n))));
}

void run_tests(int size)
{
const int n = number_of_tests(size);
const size_t length = 2*size;
result_times.clear();

// make a random test set of the chosen size:
vector<element_t> buf(length);
element_t* buffer = &buf[0];
element_t* buffer_end = &buf[length];
initialize(buffer, buffer + size); // elements
initialize(buffer + size, buffer_end); // duplicate elements
random_shuffle(buffer, buffer_end);

// test the containers:
run(array_test, buffer, buffer_end, n);
run(vector_pointer_test, buffer, buffer_end, n);
run(vector_iterator_test, buffer, buffer_end, n);
run(deque_test, buffer, buffer_end, n);
run(list_test, buffer, buffer_end, n);
run(set_test, buffer, buffer_end, n);
run(multiset_test, buffer, buffer_end, n);
do_row(size);
}

int main()
{
do_head();
const int sizes [] = {10, 100, 1000, 10000, 100000, 1000000};
const int n = sizeof(sizes)/sizeof(int);
for (int i = 0; i < n; ++i) run_tests(sizes[i]);
return do_tail();
}

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Siemel Naran

unread,

May 30, 2003, 7:36:24 AM5/30/03

to

"Bjarne Stroustrup" <b...@research.att.com> wrote in message

> The primary purpose of this benchmark is to show how different standard
> containers are in terms of performance. We have observed that programmers
> often use sets, multisets, deques in the situations where sorted vectors
> are the optimal solutions. Similarly, programmers often choose lists
simply
> because they do a few insertions and deletions without knowing that
vectors
> are more efficient at that for small containers.

On BC6 release mode the results ==>

size array vector vector deque list set multiset
10 2.69 2.26 2.24 7.00 14.99 3.09 5.60
100 1.48 1.51 1.50 4.70 4.38 2.45 3.74
1000 1.41 1.44 1.44 3.73 3.28 2.01 2.95
10000 1.42 1.45 1.45 3.72 4.40 2.38 5.07
100000 1.86 1.88 1.88 4.38 9.74 6.52 9.38
1000000 2.46 2.34 2.31 4.96 6.92 6.23 8.54

--
+++++++++++
Siemel Naran

Tim Love

unread,

May 30, 2003, 2:34:04 PM5/30/03

to

"Siemel Naran" <Sieme...@KILL.att.net> writes:

>"Bjarne Stroustrup" <b...@research.att.com> wrote in message

>> The primary purpose of this benchmark is to show how different
standard
>> containers are in terms of performance. We have observed that
programmers
>> often use sets, multisets, deques in the situations where sorted
vectors
>> are the optimal solutions. Similarly, programmers often choose lists
>simply
>> because they do a few insertions and deletions without knowing that
>vectors
>> are more efficient at that for small containers.

>On BC6 release mode the results ==>

>size array vector vector deque list set multiset
>10 2.69 2.26 2.24 7.00 14.99 3.09 5.60
>100 1.48 1.51 1.50 4.70 4.38 2.45 3.74
>1000 1.41 1.44 1.44 3.73 3.28 2.01 2.95
>10000 1.42 1.45 1.45 3.72 4.40 2.38 5.07
>100000 1.86 1.88 1.88 4.38 9.74 6.52 9.38
>1000000 2.46 2.34 2.31 4.96 6.92 6.23 8.54

Here are the results from HP-UX's aCC -O
size array v ptr v it deque list set multiset

10 3.30 3.60 3.62 16.30 15.19 9.79 14.50
100 1.59 1.67 1.67 6.42 8.89 4.83 7.99
1000 1.35 1.41 1.40 5.03 7.40 3.83 5.78
10000 1.41 1.30 1.30 4.53 6.79 3.53 4.82
100000 1.37 1.44 1.43 4.53 11.28 4.54 7.61
1000000 1.52 1.54 1.53 4.69 17.59 9.17 12.35

Siemel Naran

unread,

May 31, 2003, 5:12:04 AM5/31/03

to

"Bjarne Stroustrup" <b...@research.att.com> wrote in message

> The primary purpose of this benchmark is to show how different standard

> containers are in terms of performance. We have observed that programmers
> often use sets, multisets, deques in the situations where sorted vectors
> are the optimal solutions. Similarly, programmers often choose lists
simply
> because they do a few insertions and deletions without knowing that
vectors
> are more efficient at that for small containers.

Sometimes we want a map<T,U> with the two phase usage that we'll populate
the map in phase one, and read from the map in phase two treating the map as
const. In this case it might be better to use a
std::vector<std::pair<T,U>>, populate the vector, sort, then use lower_bound
and upper_bound to search the vector.

Can anyone think of other examples or heuristics that tell us what container
to use where?

--
+++++++++++
Siemel Naran

Dhruv

unread,

May 31, 2003, 5:14:45 AM5/31/03

to

t...@eng.cam.ac.uk (Tim Love) wrote in message news:<bb7kdb$agu$1...@pegasus.csx.cam.ac.uk>...
> "Siemel Naran" <Sieme...@KILL.att.net> writes:
>

> >On BC6 release mode the results ==>
>
> >size array vector vector deque list set multiset
> >10 2.69 2.26 2.24 7.00 14.99 3.09 5.60
> >100 1.48 1.51 1.50 4.70 4.38 2.45 3.74
> >1000 1.41 1.44 1.44 3.73 3.28 2.01 2.95
> >10000 1.42 1.45 1.45 3.72 4.40 2.38 5.07
> >100000 1.86 1.88 1.88 4.38 9.74 6.52 9.38
> >1000000 2.46 2.34 2.31 4.96 6.92 6.23 8.54
>
> Here are the results from HP-UX's aCC -O
> size array v ptr v it deque list set multiset
>
> 10 3.30 3.60 3.62 16.30 15.19 9.79 14.50
> 100 1.59 1.67 1.67 6.42 8.89 4.83 7.99
> 1000 1.35 1.41 1.40 5.03 7.40 3.83 5.78
> 10000 1.41 1.30 1.30 4.53 6.79 3.53 4.82
> 100000 1.37 1.44 1.43 4.53 11.28 4.54 7.61
> 1000000 1.52 1.54 1.53 4.69 17.59 9.17 12.35
>

/* Results: On an AMD Duron 1GHz processor, 128M RAM.
g++ 3.2: gcc 3.2: libstdc++ (SGI derived).
Compile command: g++ -Wall -O3 -o container_test container_test.cpp

size array vector with pointers vector with iterators deque
list set multiset
10 1.04 1.17 1.42 2.49 10.20 2.57
4.44
100 0.76 0.76 1.04 1.88 3.37 2.00
3.11
1000 0.84 0.85 1.04 1.65 2.59 1.61
2.27
10000 1.06 1.05 1.24 1.75 5.64 3.26
5.95
100000 1.00 1.00 1.16 1.60 7.81 5.86
7.95
1000000 1.12 1.13 1.30 1.74 7.28 8.24
11.50

*/

Francis Glassborow

unread,

May 31, 2003, 11:34:15 AM5/31/03

to

In message <cf18e89.03053...@posting.google.com>, Dhruv
<dhru...@gmx.net> writes

>/* Results: On an AMD Duron 1GHz processor, 128M RAM.
> g++ 3.2: gcc 3.2: libstdc++ (SGI derived).
> Compile command: g++ -Wall -O3 -o container_test container_test.cpp
>
>
>size array vector with pointers vector with iterators deque
> list set multiset
>10 1.04 1.17 1.42 2.49 10.20 2.57
> 4.44
>100 0.76 0.76 1.04 1.88 3.37 2.00
> 3.11
>1000 0.84 0.85 1.04 1.65 2.59 1.61
> 2.27
>10000 1.06 1.05 1.24 1.75 5.64 3.26
> 5.95
>100000 1.00 1.00 1.16 1.60 7.81 5.86
> 7.95
>1000000 1.12 1.13 1.30 1.74 7.28 8.24
> 11.50
>
>*/

While the results I got using G++ in its MinGW form were slightly
different the overall pattern was the same. IIUC this suggests that if a
vector will do (you can accept the invalidation of iterators by
push_back()) using a vector with pointers is as good as using an array.
A vector with iterators is better in all cases than any other container
type.

The odd feature to my mind is the shape of the performance curve for
sets and multisets which seems U shaped for the data we have so far.

I suggest that one small addition and one small change makes posting
these results easier:

add

ofstream out("data.txt")

as a global variable :(

and replace uses of cout with out.

--
ACCU Spring Conference 2003 April 2-5
The Conference you should not have missed
ACCU Spring Conference 2004 Late April
Francis Glassborow ACCU

LLeweLLyn

unread,

May 31, 2003, 6:45:39 PM5/31/03

to

Here are results using g++3.3 -O3, on i686-linux-gnu, with some
formatting changes for easier reading:

size array vector with pointers vector with iterators

10 2.02 2.13 2.02
100 1.36 1.32 1.35
1000 1.33 1.31 1.40
10000 1.63 1.65 1.77
100000 1.50 1.53 1.60
1000000 1.78 1.73 1.69

size deque list set multiset
10 3.35 12.56 3.18 5.71
100 2.47 4.61 2.54 3.78
1000 2.26 3.60 2.06 3.06
10000 2.39 6.95 3.87 7.01
100000 2.30 9.54 6.90 9.57
1000000 2.44 8.91 9.76 13.46

It seems gcc 3.3 is finally bring gcc back down to the abstraction
penalty gcc 2.95.x had years ago ... :-/

(note: after seeing several runs, it seemed I noticed a variance of
about +/-10%, so don't read too much into differences in the lower
digits.)

John Potter

unread,

Jun 1, 2003, 4:35:35 AM6/1/03

to

On 29 May 2003 14:29:36 -0400, b...@research.att.com (Bjarne Stroustrup)
wrote:

[benchmark]

A nit on fairness. The array based containers do not destruct the
remove without erase items while the node based containers do for the
erased items.

vector_pointer_test
// unique(&*container.begin(), &*container.end());
container.erase(unique(&*container.begin(), &*container.end())
- &*container.begin() + container.begin(), container.end());

vector_iterator_test
// unique(container.begin(), container.end());
container.erase(unique(container.begin(), container.end()),
container.end());

deque_test
// unique(container.begin(), container.end());
container.erase(unique(container.begin(), container.end()),
container.end());

The difference is only noticable for size 10; however, it does add
consistency to the tests.

It might also be interesting to show the cost of using the general
algorithm when a specialization is available.

void list_test2(element_t* first, element_t* last, int number_of_times)

{
list<element_t> container(first, last);
container.sort();

container.erase(unique(container.begin(), container.end()),
container.end());
}

With a little help, it can also be used with multiset in place of the
custom code.

template <class Fiter>
struct UnconstIter {
// Minimal required for my tests. Add more if needed.
typedef forward_iterator_tag iterator_category;
typedef typename Fiter::value_type value_type;
typedef typename Fiter::difference_type difference_type;
typedef value_type* pointer;
typedef value_type& reference;
Fiter p;
UnconstIter (Fiter p) : p(p) { }
UnconstIter& operator++ () { ++ p; return *this; }
reference operator* () const { return const_cast<reference>(*p); }
bool operator== (UnconstIter rhs) const { return p == rhs.p; }
bool operator!= (UnconstIter rhs) const { return p != rhs.p; }
};

void multiset_test2(element_t* first, element_t* last,

int number_of_times)
{
multiset<element_t> container(first, last);

typedef UnconstIter<multiset<element_t>::iterator> Uit;
container.erase(unique(Uit(container.begin()),
Uit(container.end())).p, container.end());
}

Note: If you are thinking of posting that this will not work, please
try it and then study the data structure. For the unique part, the
multiset is a sorted sequence. For the erase part, it is a balanced
structure where the keys are not considered.

In both cases, I observed a 10% to 20% increase in time which decreased
as size increased. The only strange value was list at 1M where there
was a large increase. Using gcc only. Considering that creating the
ordered collection is NlgN and unique is N, the increase in total time
represents more than doubling the unique time. This seems a bit
strange.

Since there seems to be some interest in seeing results, here is
gcc-2.95.2 on an old RS6000 with AIX-4.3.

size array vectorP vectorI deque list listA set multi multiA
10 12.80 11.27 11.06 33.20 87.75 88.85 24.76 36.53 41.65
100 5.58 5.50 5.48 17.00 28.39 28.61 14.72 22.04 24.90
1000 5.26 5.25 5.26 14.97 26.16 26.71 13.77 21.23 24.38
10000 5.83 5.83 5.82 14.72 30.20 31.06 16.06 23.58 26.61
100000 6.80 6.78 6.82 15.41 41.48 44.14 25.78 35.57 39.66
1000000 7.28 7.29 7.32 15.61 35.29 49.37 31.95 40.90 45.85

Result observations.

The only posted case where I know that vector::iterator is not a
pointer is gcc 3.2. It seems that the compiler is not able to make
this simple pointer wrapper as good as a pointer.

The non-vector containers are much better with gcc-3.2 than above.

The containers with space overhead show the time increase when they
force swapping at smaller sizes.

The results should show users the high cost of non-vectors. Do they
show implementers that they are doing poorly or is it just a fact of
complexity?

John

John Potter

unread,

Jun 1, 2003, 4:37:17 AM6/1/03

to

On 31 May 2003 11:34:15 -0400, Francis Glassborow
<francis.g...@ntlworld.com> wrote:

> The odd feature to my mind is the shape of the performance curve for
> sets and multisets which seems U shaped for the data we have so far.

The tests are based upon an assumption that all tests take NlgN time.
The number of times the test is performed is reduced as the size of
the data set increases. If the assumption were true, all tests for a
container would produce the same result. Any deviation we see is likely
a result of lg rounding, allocator block size, core size, cache size and
other things unrelated to the containers.

My results show the same "U" shape for all containers. It may be more
noticable for set because of the high space overhead exhausting real
memory sooner.

> I suggest that one small addition and one small change makes posting
> these results easier:

Surely you jest. >data.txt Why would you hard wire a file name?

John

LLeweLLyn

unread,

Jun 1, 2003, 4:39:53 AM6/1/03

to

"Siemel Naran" <Sieme...@KILL.att.net> writes:

> "Bjarne Stroustrup" <b...@research.att.com> wrote in message
>
> > The primary purpose of this benchmark is to show how different standard
> > containers are in terms of performance. We have observed that programmers
> > often use sets, multisets, deques in the situations where sorted vectors
> > are the optimal solutions. Similarly, programmers often choose lists
> simply
> > because they do a few insertions and deletions without knowing that
> vectors
> > are more efficient at that for small containers.
>
> Sometimes we want a map<T,U> with the two phase usage that we'll populate
> the map in phase one, and read from the map in phase two treating the map as
> const. In this case it might be better to use a
> std::vector<std::pair<T,U>>, populate the vector, sort, then use lower_bound
> and upper_bound to search the vector.
>
> Can anyone think of other examples or heuristics that tell us what container
> to use where?

[snip]

*shrug* ... use vector if you think the total data size is smaller
than 1/2 your expected (L2) cache size?

Stephen Howe

unread,

Jun 1, 2003, 4:42:24 AM6/1/03

to

> void multiset_test(element_t* first, element_t* last, int number_of_times)
> {
> multiset<element_t> container(first, last);
> typedef multiset<element_t>::iterator iterator;
> {
> iterator first = container.begin();
> iterator last = container.end();
>
> while (first != last) {
> iterator next = first;
> if (++next == last) break;
> if (*first == *next)
> container.erase(next);
> else
> ++first;
> }
> }
> }

I prefer (tighter while loop)

void multiset_test(element_t* first, element_t* last, int number_of_times)
{
multiset<element_t> container(first, last);
typedef multiset<element_t>::iterator iterator;
{
iterator first = container.begin();
iterator last = container.end();

if (first != last) {
iterator next = first;
++next;

while (next != last) {
if (*first == *next)
container.erase(first);
first = next;
++next;
}
}
}
}

I also think that the deque test should be the same as the vector test and
if VC++ 6.0's predefined macro indicates VC6 then use copy() otherwise for
other compilers, use the deque range constructor.

Stephen Howe

THORSTEN OTTOSEN

unread,

Jun 1, 2003, 4:46:40 AM6/1/03

to

machine: 900MHz P3, Windows XP

On visual C++ v 7.1 ( cl /O2 )
-------------------------------
size array vector with pointers vector with iterators deque
listsetm
ultiset
10 3.04 3.19 3.72 10.81 21.41 5.91 11.02
100 1.61 1.63 1.91 6.30 6.52 3.42 6.12
1000 1.41 1.42 1.65 5.27 4.54 2.59 5.07
10000 1.27 1.26 1.45 5.09 4.52 2.85 6.42
100000 1.56 1.55 1.66 5.35 6.40 5.10 8.32
1000000 1.63 1.63 1.76 5.47 7.12 6.39 9.69

On Comeau 4.3.0.1 with visual 7.1 backend, libcomo (como /O2 )
--------------------------------------------------------
size array vector with pointers vector with iterators deque
listsetm
ultiset
10 1.52 1.60 1.59 7.27 40.10 5.48 6.94
100 0.76 0.78 0.78 3.69 15.68 3.21 4.06
1000 0.69 0.70 0.70 3.17 12.32 2.43 3.00
10000 0.83 0.78 0.67 3.04 11.15 2.15 3.05
100000 1.00 1.00 1.00 3.34 13.91 3.78 5.34
1000000 1.19 1.19 1.19 3.40 11.07 3.56 4.44

regards

Thorsten Ottosen, Aalborg University

Francis Glassborow

unread,

Jun 1, 2003, 9:04:12 AM6/1/03

to

In message <mugidv8o3uk2v58a4...@4ax.com>, John Potter
<jpo...@falcon.lhup.edu> writes

> > I suggest that one small addition and one small change makes posting
> > these results easier:
>
>Surely you jest. >data.txt Why would you hard wire a file name?

Because it was the minimal change to hard-wiring console output.

--
ACCU Spring Conference 2003 April 2-5
The Conference you should not have missed
ACCU Spring Conference 2004 Late April
Francis Glassborow ACCU

Troll_King

unread,

Jun 1, 2003, 9:05:39 AM6/1/03

to

b...@research.att.com (Bjarne Stroustrup) wrote in message
news:<HFnqC...@research.att.com>...

> /* Standard Container Benchmark
>
> Version 0.9, May 23, 2003
>
> The primary purpose of this benchmark is to show how different
standard
> containers are in terms of performance.

> Frequently, of course, performance does not matter,

> but it is important that programmers are aware of the consequences of
their
> choices. We are not saying that only vectors should be used, there are
> cases when one really needs a more complicated data structure, but one
needs to
> understand the price for additional functionality.

>

> The benchmark is doing the same task 7 times using different data
structures:
> array, vector (using a pointer as iterator), vector (using the defailt
cector iterator),
> list, deque, set and multiset. The task is to remove duplicates from a
sequence of doubles.
> This is a simple test, but it illustrates the performance of
containers.

If I were a Master Class programmer than I would exchange a class or
struct type instead for the 'double' built in type and observe vector
container performance degrate significantly compared to std::list. So
one can probably say that std::vector is the best choice in general
when storing built in types however look into other containers when
storing class and struct types.

Alex Vinokur

unread,

Jun 1, 2003, 1:05:08 PM6/1/03

to

###########################
### Windows, MINGW, g++ ###
###########################

==========================================
Windows 2000
Intel(R) Celeron(R) CPU 1.70 GHz
Total Physical Memory (RAM) 523,760 KB
Available Physical Memory 389,360 KB
Total Virtual Memory 1,407,956 KB
Available Virtual Memory 1,119,024 KB
Page File Space 884,196 KB

MinGW 2.0.0.-2
GNU g++ version 3.2 (mingw special 20020817-1)
The version of the C++ library in compressed ISO date format :
__GLIBCPP__ == 20020816
==========================================

--------------- No optimization ---------------
size array v-ptr v-iter deque list set multiset
10 2.39 2.56 4.17 6.71 40.98 11.80 16.60
100 2.84 2.86 5.54 8.17 20.01 9.35 11.76
1000 2.85 2.87 5.22 7.13 17.07 8.56 10.14
10000 2.93 2.91 5.07 6.63 23.09 11.96 17.82
100000 2.50 2.51 4.50 5.86 25.26 16.69 19.97
1000000 2.71 2.70 4.91 6.31 20.07 17.04 19.74

--------------- Optimization O1 ---------------
size array v-ptr v-iter deque list set multiset
10 3.39 3.42 4.67 6.30 13.68 3.38 6.38
100 1.81 1.78 2.05 3.63 5.92 3.07 4.57
1000 1.95 1.94 2.19 3.29 5.19 3.11 4.11
10000 2.13 2.12 2.15 3.27 13.29 7.76 14.25
100000 2.15 1.93 2.07 3.01 17.27 12.79 16.08
1000000 2.10 2.10 2.16 3.09 11.98 12.44 15.23

--------------- Optimization O2 ---------------
size array v-ptr v-iter deque list set multiset
10 3.44 3.53 3.48 6.73 14.67 3.97 7.28
100 2.12 1.88 1.86 3.77 6.40 3.23 5.24
1000 1.97 1.98 2.11 3.31 5.42 3.09 4.37
10000 2.15 2.14 2.21 3.23 13.04 7.56 13.84
100000 1.92 1.91 2.04 2.85 16.81 12.67 16.22
1000000 2.12 2.11 2.22 3.00 12.00 12.47 15.38

--------------- Optimization O3 ---------------
size array v-ptr v-iter deque list set multiset
10 2.36 2.43 3.50 6.28 13.18 3.69 6.29
100 1.28 1.28 1.77 3.65 6.25 3.13 4.54
1000 1.69 1.68 2.14 3.35 5.39 3.13 4.11
10000 1.96 1.95 2.25 3.29 13.18 7.59 13.72
100000 1.92 1.93 2.22 3.13 18.48 13.91 17.05
1000000 1.94 1.93 2.19 2.99 11.67 12.11 14.75

--
==========================================
Alex Vinokur
mailto:ale...@connect.to
http://www.simtel.net/pub/oth/19088.html
http://sourceforge.net/users/alexvn
==========================================

Alex Vinokur

unread,

Jun 1, 2003, 2:51:03 PM6/1/03

to

{This is an exact (or near-exact) duplicate of an article
that has already been processed. -mod}

############################
### Windows, CYGWIN, g++ ###
############################

==========================================
Windows 2000
Intel(R) Celeron(R) CPU 1.70 GHz
Total Physical Memory (RAM) 523,760 KB
Available Physical Memory 389,360 KB
Total Virtual Memory 1,407,956 KB
Available Virtual Memory 1,119,024 KB
Page File Space 884,196 KB

CYGWIN_NT-5.0 1.3.22(0.78/3/2)
GNU g++ version 3.2 20020927 (prerelease)

The version of the C++ library in compressed ISO date format :

__GLIBCPP__ == 20020927
==========================================

--------------- No optimization ---------------
size array v-ptr v-iter deque list set multiset

10 2.83 3.00 4.58 6.79 19.57 5.27 7.58
100 1.25 1.25 2.34 3.67 18.80 9.37 11.69
1000 2.78 2.78 4.93 6.84 15.65 8.43 10.01
10000 2.74 2.75 4.79 6.41 21.99 11.61 17.77
100000 2.45 2.44 4.29 5.68 24.75 16.83 20.59
1000000 2.57 2.58 4.54 5.91 22.46 21.29 25.45

--------------- Optimization O1 ---------------
size array v-ptr v-iter deque list set multiset

10 3.50 3.56 4.15 5.94 10.52 3.21 5.40
100 1.59 1.60 1.96 3.71 5.54 2.92 4.35
1000 1.92 1.92 2.10 3.25 4.98 2.92 3.86
10000 1.96 1.96 2.07 3.19 12.64 7.20 12.92
100000 1.83 1.84 1.91 2.80 17.07 12.86 16.46
1000000 1.97 1.98 2.04 2.92 14.06 16.91 20.91

--------------- Optimization O2 ---------------
size array v-ptr v-iter deque list set multiset

10 3.79 3.56 3.56 5.68 10.52 3.14 5.38
100 1.61 1.61 1.71 3.54 5.52 2.89 4.34
1000 1.92 1.92 2.08 3.20 4.88 2.94 3.85
10000 1.97 1.97 2.13 3.14 12.22 7.23 12.79
100000 1.86 1.84 1.99 2.76 16.77 12.96 16.46
1000000 1.98 2.00 2.11 2.86 13.84 16.91 20.85

--------------- Optimization O3 ---------------
size array v-ptr v-iter deque list set multiset

10 3.33 3.35 3.56 6.02 10.57 3.27 5.18
100 1.26 1.25 1.71 3.50 5.60 2.98 4.27
1000 1.64 1.65 2.09 3.26 4.91 2.98 3.71
10000 1.79 1.79 2.15 3.12 11.85 7.37 13.15
100000 1.86 1.87 2.17 3.01 18.30 14.32 18.03
1000000 1.85 1.87 2.13 2.86 13.83 17.11 20.95

--
==========================================
Alex Vinokur
mailto:ale...@connect.to
http://www.simtel.net/pub/oth/19088.html
http://sourceforge.net/users/alexvn
==========================================

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Alex Vinokur

unread,

Jun 1, 2003, 2:51:27 PM6/1/03

to

{This is an exact (or near-exact) duplicate of an article
that has already been processed. -mod}

###########################
### Windows, DJGPP, gpp ###
###########################

==========================================
Windows 2000
Intel(R) Celeron(R) CPU 1.70 GHz
Total Physical Memory (RAM) 523,760 KB
Available Physical Memory 389,360 KB
Total Virtual Memory 1,407,956 KB
Available Virtual Memory 1,119,024 KB
Page File Space 884,196 KB

DJGPP 2.03
GNU gpp version 3.2.1

The version of the C++ library in compressed ISO date format :

__GLIBCPP__ == 20021119
==========================================

--------------- No optimization ---------------
size array v-ptr v-iter deque list set multiset

10 2.09 2.25 3.57 5.16 20.55 5.44 8.41
100 2.86 2.86 5.22 7.86 19.89 9.12 11.43
1000 2.86 2.86 5.27 7.09 17.20 8.41 10.00
10000 2.80 3.24 4.95 6.37 22.58 11.21 17.09
100000 2.42 2.42 4.34 5.60 25.33 16.70 20.66
1000000 2.58 2.58 4.62 5.82 22.91 21.15 25.66

--------------- Optimization O1 ---------------
size array v-ptr v-iter deque list set multiset

10 2.97 3.02 3.63 4.78 11.54 3.35 6.32
100 1.59 1.65 1.98 3.24 5.99 2.86 4.40
1000 1.92 1.92 2.14 3.13 5.33 2.97 3.85
10000 1.98 1.98 2.09 3.13 13.24 7.36 13.30
100000 1.76 1.81 1.87 2.69 17.09 12.97 16.54
1000000 1.98 1.98 2.03 2.80 14.18 16.70 20.82

--------------- Optimization O2 ---------------
size array v-ptr v-iter deque list set multiset

10 2.91 3.02 3.08 5.05 15.49 3.41 6.26
100 1.70 1.65 1.70 3.30 5.93 2.86 4.56
1000 1.98 1.98 2.09 3.13 5.11 2.97 4.01
10000 1.98 1.98 2.09 3.08 12.58 7.20 13.19
100000 1.87 1.81 1.92 2.64 16.65 12.86 16.48
1000000 1.98 2.03 2.03 2.80 13.90 16.65 20.71

--------------- Optimization O3 ---------------
size array v-ptr v-iter deque list set multiset

10 1.59 1.70 3.08 5.27 11.65 4.84 5.82
100 1.21 1.26 1.76 3.41 5.82 2.91 4.34
1000 1.65 1.70 2.14 3.13 5.16 2.97 3.85
10000 1.81 1.81 2.14 3.08 12.97 7.20 12.91
100000 1.81 1.87 2.09 2.91 18.52 13.96 17.91
1000000 1.87 1.81 2.09 2.80 13.90 16.81 20.71

==========================================
Alex Vinokur
mailto:ale...@connect.to
http://www.simtel.net/pub/oth/19088.html
http://sourceforge.net/users/alexvn
==========================================

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

John Potter

unread,

Jun 1, 2003, 5:37:13 PM6/1/03

to

On 1 Jun 2003 04:42:24 -0400, "Stephen Howe"
<NOSPAM...@dial.pipex.com> wrote:

> I prefer (tighter while loop)

Correct comes first. The algorithm is required to keep the
first and remove the others. Yours keeps the last.

> iterator first = container.begin();
> iterator last = container.end();

> if (first != last) {
> iterator next = first;
> ++next;

> while (next != last) {
> if (*first == *next)
> container.erase(first);
> first = next;
> ++next;
> }

for (; next != last; ++ next)

if (*first == *next) {
container.erase(next);

next = first;
}
else
first = next;

I agree with your choice of a pretest and nice loop over the abnormal
exit loop. But this is just style and has nothing to do with the
objective of the benchmark.

John

Carl Barron

unread,

Jun 1, 2003, 5:39:45 PM6/1/03

to

Bjarne Stroustrup <b...@research.att.com> wrote:

> void vector_pointer_test(element_t* first, element_t* last,
int number_of_times)
> {
> vector<element_t> container(first, last);
> // &*container.begin() gets us a pointer to the first element
> sort(&*container.begin(), &*container.end());
> unique(&*container.begin(), &*container.end());
> }

nit pick: vector<T,A>::end() does not need to be dereferenceable. but
a vector is in a single block so sort(&container.front(),
&container.back()+1); unique(&constainer.front(), &container.back()+1);
is more portable:)

John Potter

unread,

Jun 1, 2003, 5:42:14 PM6/1/03

to

On 1 Jun 2003 09:05:39 -0400, trol...@shaw.ca (Troll_King) wrote:

> b...@research.att.com (Bjarne Stroustrup) wrote in message
> news:<HFnqC...@research.att.com>...

> > The primary purpose of this benchmark is to show how different

> > standard containers are in terms of performance.

> If I were a Master Class programmer than I would exchange a class or

> struct type instead for the 'double' built in type and observe vector
> container performance degrate significantly compared to std::list.

Being a curious third class hacker, I made the exchange.

double
size array vectP vectI deque list listA set mult multA
10 1.67 1.73 1.75 2.81 4.95 5.88 1.55 2.56 3.12
100 0.75 0.77 0.83 1.66 2.56 3.06 1.38 2.09 2.45
1000 0.91 0.91 0.97 1.48 2.28 2.62 1.39 1.83 2.16
10000 0.84 0.89 0.92 1.34 3.17 3.48 1.44 3.78 4.25
100000 0.83 0.83 0.91 1.20 6.25 6.89 4.70 6.11 6.64
1000000 0.83 0.86 0.92 1.25 5.33 8.61 6.28 7.67 8.53

struct { char[6]; };
size array vectP vectI deque list listA set mult multA
10 7.41 7.00 7.08 8.66 8.88 9.30 5.53 7.56 7.99
100 5.45 5.55 5.56 6.19 6.06 6.14 5.14 6.30 6.80
1000 5.41 5.41 5.44 5.83 5.48 5.59 4.86 5.69 6.03
10000 5.20 5.09 5.24 5.56 6.53 6.72 5.09 7.30 7.88
100000 4.75 4.75 4.84 5.06 9.45 10.14 7.67 9.39 9.89
1000000 5.03 5.09 5.17 5.33 8.67 12.08 9.38 10.95 11.84

string
size array vectP vectI deque list listA set mult multA
10 24.78 22.34 22.61 24.06 11.36 12.95 6.94 9.88 11.44
100 15.50 14.67 14.88 15.25 7.27 8.05 5.75 7.28 8.17
1000 13.69 13.11 13.34 13.83 6.47 6.97 5.23 6.30 7.02
10000 13.67 13.55 13.56 14.09 11.05 11.86 8.67 11.55 12.73
100000 14.31 13.94 14.00 14.41 14.22 15.36 12.02 13.95 15.00
1000000 16.28 15.89 16.16 16.39 14.59 19.09 15.02 17.98 19.77

A reasonable compare point seems to be 1000. Array based shines
for double. All containers look alike for struct. Node based
shines for dynamic members. For large enough size, everything
becomes i/o bound cobol. Node based has smaller large.

The purpose of this benchmark is?

John

Siemel Naran

unread,

Jun 1, 2003, 5:43:21 PM6/1/03

to

"Stephen Howe" <NOSPAM...@dial.pipex.com> wrote in message
news:3ed9614a$0

> I also think that the deque test should be the same as the vector test and

A deque<T> is conceptually like a list of vector<T>. Thus a deque::iterator
has contains two iterators -- a vector<T>::iterator for the element within
the list/segment, and a list<vector<T>>::iterator for the current
list/segment. For deque<T>::iterator::operator== the system has to compare
vector::iterator only assuming different deques don't share segments. But
deque<T>::operator++ is inefficient: program has to do operator++ on the
vector::iterator (as always) then check to see if we are at the end of the
segment and if so go to the start of the next segment (the extra step).

But an implementation may specialize for_each, find_if, unique,sort and the
others for deque<iterator> by rewriting the algorithm as a double nested
loop. The outer loops over segments, and the inner loops over elements in
the segment. This way the inner loop does not have to concern itself with
checking if we are at the end of the segment and if so go to the start of
the next segment (the extra step).

My implementation does not have any of these specializations. Though it
does have a specialization for find_if for random access iterators for
manual loop unrolling, where instead of looping N times for an array of N
elements they loop N/4 times where the body of the loop does 4 comparisons.

With the right specializations for all the algorithms I imagine deque will
be almost the same as vector, maybe just 5% more. To test this will be a
lengthy project, though I hope someone does it. We'd have to
[ ] create a class __segmented_access_category_tag derived from
random_access_category_tag
[ ] use it in class deque::iterator and const_iterator
[ ] create member functions __segment_begin and __segment_end
[ ] create typedef to be returned by above functions
[ ] create member functions __element_begin and __element_end
[ ] create typedef to be returned by above functions
[ ] specialize all the algorithms
Will attempt this soon for some of the algorithms.

Also, for best results, the compiler should optimize small user types as it
does the builtin types. I believe KCC does this optimization.

--
+++++++++++
Siemel Naran

Andy Sawyer

unread,

Jun 2, 2003, 12:11:44 AM6/2/03

to

In article <QYxxC2B$Fc2+...@robinton.demon.co.uk>,
on 1 Jun 2003 09:04:12 -0400,
Francis Glassborow <francis.g...@ntlworld.com> wrote:

> In message <mugidv8o3uk2v58a4...@4ax.com>, John Potter
> <jpo...@falcon.lhup.edu> writes
> > > I suggest that one small addition and one small change makes
posting
> > > these results easier:
> >
> >Surely you jest. >data.txt Why would you hard wire a file name?
>
> Because it was the minimal change to hard-wiring console output.

Except the original does not have "hard-wired" console output, it has
"hard-wired" "standard output" output. Not the same thing at all. On
most platforms where this benchmark is likely to be run, this is
trivial to redirect to a file with whatever name you like (including,
but certainly not limited to, 'data.txt'). (Of course, 'data.txt' may
not be a valid filename on some platforms)

--
"Light thinks it travels faster than anything but it is wrong. No matter
how fast light travels it finds the darkness has always got there
first,
and is waiting for it." -- Terry Pratchett, Reaper Man

Alex Vinokur

unread,

Jun 2, 2003, 5:31:56 AM6/2/03

to

| ######################################
| ######### Windows, DJGPP, gpp ########
| ### Using C/C++ Program Perfometer ###
| ######################################

Performance was measured with using C/C++ Program Perfometer.
http://alexvn.freeservers.com/s1/perfometer.html
http://sourceforge.net/projects/cpp-perfometer

| ==========================================
| Windows 2000
| Intel(R) Celeron(R) CPU 1.70 GHz

| DJGPP 2.03
| GNU gpp version 3.2.1

| ==========================================

| --------------------------------
| Fragments of the Raw Run Log
| Note. Full Raw Run Log can be seen at :
| http://groups.google.com/groups?th=2407b19b6bdb96c9
| --------------------------------

Comparative Analysis of Container Performance
---------------------------------------------

No optimization
: ================================
: array-10 -> 12
: array-100 -> 135
: array-1000 -> 1980
: --------------------------------
: vector-ptr-10 -> 10
: vector-ptr-100 -> 136
: vector-ptr-1000 -> 1976
: --------------------------------
: vector-iter-10 -> 22
: vector-iter-100 -> 290
: vector-iter-1000 -> 3964
: --------------------------------
: deque-10 -> 30
: deque-100 -> 390
: deque-1000 -> 5145
: --------------------------------
: list-10 -> 105
: list-100 -> 955
: list-1000 -> 11840
: --------------------------------
: set-10 -> 25
: set-100 -> 433
: set-1000 -> 5960
: --------------------------------
: multiset-10 -> 41
: multiset-100 -> 565
: multiset-1000 -> 7119
: ================================

--

Troll_King

unread,

Jun 2, 2003, 7:54:57 AM6/2/03

to

John Potter <jpo...@falcon.lhup.edu> wrote in message news:<esdkdvk1gmfe030u9...@4ax.com>...

> On 1 Jun 2003 09:05:39 -0400, trol...@shaw.ca (Troll_King) wrote:
>
> > b...@research.att.com (Bjarne Stroustrup) wrote in message
> > news:<HFnqC...@research.att.com>...
>
> > > The primary purpose of this benchmark is to show how different
> > > standard containers are in terms of performance.
>

{excessive quote snipped -mod}

If you use a class type with a copy constructor and assignment
operator, than std::vector performs a lot worse than the test types
you used, but anyway this experiment shows me that GNU Linux g++ 3.2
on RH 9.0 vector using iterators performs much worse than the vector
using pointers (using the original program) :
.... array vecP vecI
10 2.17 2.49 5.95
100 1.68 1.70 4.20
1000 1.57 1.58 3.88
10000 1.76 1.74 3.86
100000 1.77 1.77 3.58
1000000 2.01 2.02 4.00

GNU Linux just doesn't seem to have much respect for Standard C++, but
I think all that's going to change in the future.

Stephen Howe

unread,

Jun 2, 2003, 2:04:29 PM6/2/03

to

> Correct comes first. The algorithm is required to keep the
> first and remove the others. Yours keeps the last.

It won't make any difference for double but for other element types it will.
Yes you are right.

Stephen Howe

Dhruv

unread,

Jun 2, 2003, 2:37:11 PM6/2/03

to

trol...@shaw.ca (Troll_King) wrote in message news:<39ca89a9.03060...@posting.google.com>...

> John Potter <jpo...@falcon.lhup.edu> wrote in message news:<esdkdvk1gmfe030u9...@4ax.com>...
> > On 1 Jun 2003 09:05:39 -0400, trol...@shaw.ca (Troll_King) wrote:
> >
> > > b...@research.att.com (Bjarne Stroustrup) wrote in message
> > > news:<HFnqC...@research.att.com>...
>
> > > > The primary purpose of this benchmark is to show how different
> > > > standard containers are in terms of performance.
> >
> {excessive quote snipped -mod}

> { another major snip here } ......

Well, after going through all the posts, I think that the performance
depends not only on the choice of the container, but also on the
compiler, and the particular implementation of the library. Ok, well
that's pretty obvious. Also, I read a post which showed that the
performance of the containers (such as vector, whic rely on copy for
moving about their memebers) was pretty bad, if not pathetic. I did
not think of this originally.

If you look more closely at the performance of the map and multimap,
you will see that the performance of those for small data sets is
better , and worse for large data sets for the SGI derived
implementations, while its opposite for the Dinkumware
implementation. I'm no expert on maps, and multipaps (well, never used
them before EVER!!!) Also, I think that a lot also depends upon the
allocators provided by these implementations. One fact to note is that
the SGI allocators do not return memory to the OS after their
destructor has been invoked, while others might or might not, you
never know. Here, during testing the time for the construction ad
destruction of the object, and hence the underlying allocator had also
been taken into account, since the containers atre local to their
respective functions. I donot know if this will yield expected results
for allocators that do actually give back to the OS the memory that
they acquire.

Also, I would like to know if I could implement a vector (that is
standards compilant) in a way that the elements are not contiguous in
memory. This wil improve the performance of the vector for non-POD
types (I think). So, we could specialize this implementation for
non-POD's. I asked this because I'm currently trying to re-write some
standard containers that are frequently used (such as list, vector,
and later maybe map, and string) for improving thie performance. This
project will be called (nstl => non-standard template library), which
will work with the stl, it's not meant as a replacement.

Well, getting back to the vector, I intend making it as an array of
pointers to the type to be stored. That means that all the
requirements like conatant elemental access, and linear insertion, and
deletiond will be satisfied, but scaled down by a huge margin for huge
data types. If anyone has implemented such a thing, please mail
me. I'll try to use it instead of starting from scratch.

Also, 1 more thing... How would it differ if the array_test function
instead of having called opreator new, allocated the memory form the
runtime stack of the program, ok, I know the size should be known, but
for now assume that it is. I do not know much about cache, but I think
that the code and data cache is different. Please could someone thrwo
more light onto that??????

-Dhruv.

John Potter

unread,

Jun 2, 2003, 2:39:14 PM6/2/03

to

On 2 Jun 2003 07:54:57 -0400, trol...@shaw.ca (Troll_King) wrote:

> If you use a class type with a copy constructor and assignment
> operator, than std::vector performs a lot worse than the test types
> you used,

I'm pretty sure that std::string has those things. For small
sizes up to 1000, vector took twice the time of node based containers.
For larger sizes, the space overhead of the node based containers
destroyed their advantage and all containers look about the same.

> but anyway this experiment shows me that GNU Linux g++ 3.2
> on RH 9.0 vector using iterators performs much worse than the vector
> using pointers (using the original program) :

> .... array vecP vecI
> 10 2.17 2.49 5.95
> 100 1.68 1.70 4.20
> 1000 1.57 1.58 3.88
> 10000 1.76 1.74 3.86
> 100000 1.77 1.77 3.58
> 1000000 2.01 2.02 4.00

You rally do need to find the -O switch. When you ask for slow code,
you will get it. Same code, same compiler, different box/os.

No optimization
10 2.77 3.00 4.53
100 1.27 1.26 2.33
1000 1.30 1.31 2.30
10000 1.24 1.23 2.16
100000 1.12 1.12 1.99
1000000 1.19 1.17 2.06

-O2
10 1.67 1.64 1.70
100 0.77 0.78 0.81
1000 0.89 0.88 0.97
10000 0.83 0.83 0.92
100000 0.81 0.81 0.86
1000000 0.84 0.83 0.94

John

Andy Sawyer

unread,

Jun 3, 2003, 5:26:56 AM6/3/03

to

In article <cf18e89.03060...@posting.google.com>,
on 2 Jun 2003 14:37:11 -0400,
dhru...@gmx.net (Dhruv) wrote:

> Also, I would like to know if I could implement a vector (that is
> standards compilant) in a way that the elements are not contiguous
> in memory.

A vector implemented in such a way would conform with the letter (if
not the intent) of ISO/IEC 14882:1998(E). However, TC1 makes it clear
that the storage for vector should be contiguous.

> Well, getting back to the vector, I intend making it as an array of
> pointers to the type to be stored. That means that all the
> requirements like conatant elemental access, and linear insertion, and
> deletiond will be satisfied, but scaled down by a huge margin for huge
> data types. If anyone has implemented such a thing, please mail
> me. I'll try to use it instead of starting from scratch.

There are many other possibilites with a container like this, and it's
certainly useful. It's also not limited in functionality to "huge"
types - it's handy for small types with expensive copy/assignment
operations. It's also (effectively) a deque with a page-size of 1.

Regards,
Andy S.

--
"Light thinks it travels faster than anything but it is wrong. No matter
how fast light travels it finds the darkness has always got there first,
and is waiting for it." -- Terry Pratchett, Reaper Man

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Rayiner Hashem

unread,

Jun 3, 2003, 5:31:29 AM6/3/03

to

> GNU Linux just doesn't seem to have much respect for Standard C++, but
> I think all that's going to change in the future.

Odd, on my machine (Gentoo, G++ 3.2.2) the vector with iterators is
only a few tenths of a second slower than the vector with pointers.

Troll_King

unread,

Jun 3, 2003, 5:35:50 AM6/3/03

to

John Potter <jpo...@falcon.lhup.edu> wrote in message news:<lu1ndvo9q6md1n338...@4ax.com>...

Could you show me the exact instruction you entered to compile that
code using g++? I had trouble with this because simply adding -02
isn't recognized by g++.

With regard to the data types, see C++ Primer 3rd edition page 254 -
260. The std::string has a different capacity than a large complex
class, so the full penalty of std::vector isn't realized.

Francis Glassborow

unread,

Jun 3, 2003, 9:26:15 AM6/3/03

to

In message <cf18e89.03060...@posting.google.com>, Dhruv
<dhru...@gmx.net> writes

>Also, I would like to know if I could implement a vector (that is
>standards compilant) in a way that the elements are not contiguous in
>memory.

No you cannot. It is effectively a requirement that vector contain its
data in an array.

--
ACCU Spring Conference 2003 April 2-5
The Conference you should not have missed
ACCU Spring Conference 2004 Late April
Francis Glassborow ACCU

John Potter

unread,

Jun 3, 2003, 3:04:22 PM6/3/03

to

On 1 Jun 2003 04:35:35 -0400, John Potter <jpo...@falcon.lhup.edu>
wrote:

> In both cases, I observed a 10% to 20% increase in time which decreased
> as size increased. The only strange value was list at 1M where there
> was a large increase. Using gcc only. Considering that creating the
> ordered collection is NlgN and unique is N, the increase in total time
> represents more than doubling the unique time. This seems a bit
> strange.

Some investigation showed that the difference is a result of locality of
reference. For size == 1000000 whichever test, list.unique or unique(),
runs second has the large added time. Having sorted the nodes and then
deallocated them produces poor locality for the second build and sort.

As always, one wonders what the benchmark is measuring.

John Potter

unread,

Jun 3, 2003, 5:26:46 PM6/3/03

to

On 3 Jun 2003 05:35:50 -0400, trol...@shaw.ca (Troll_King) wrote:

> Could you show me the exact instruction you entered to compile that
> code using g++? I had trouble with this because simply adding -02
> isn't recognized by g++.

g++ [options] [files]

$ g++ -O2 ct.cc

John

LLeweLLyn

unread,

Jun 3, 2003, 5:56:45 PM6/3/03

to

hel...@mindspring.com (Rayiner Hashem) writes:

> > GNU Linux just doesn't seem to have much respect for Standard C++, but
> > I think all that's going to change in the future.
> Odd, on my machine (Gentoo, G++ 3.2.2) the vector with iterators is
> only a few tenths of a second slower than the vector with pointers.

[snip]

And on my machine, mandrake linux with g++ 3.3 -O3, vector with
iterators is sometimes a few percent _faster_ and sometimes a few
percent slower; e.g. the difference between a pointer and an
iterator is less then the error margin of the test.

LLeweLLyn

unread,

Jun 3, 2003, 5:57:26 PM6/3/03

to

trol...@shaw.ca (Troll_King) writes:

> John Potter <jpo...@falcon.lhup.edu> wrote in message news:<lu1ndvo9q6md1n338...@4ax.com>...
> > On 2 Jun 2003 07:54:57 -0400, trol...@shaw.ca (Troll_King)
> wrote:

[snip]

> > You rally do need to find the -O switch. When you ask for slow code,
> > you will get it. Same code, same compiler, different box/os.
> >
> > No optimization
> > 10 2.77 3.00 4.53
> > 100 1.27 1.26 2.33
> > 1000 1.30 1.31 2.30
> > 10000 1.24 1.23 2.16
> > 100000 1.12 1.12 1.99
> > 1000000 1.19 1.17 2.06
> >
> > -O2
> > 10 1.67 1.64 1.70
> > 100 0.77 0.78 0.81
> > 1000 0.89 0.88 0.97
> > 10000 0.83 0.83 0.92
> > 100000 0.81 0.81 0.86
> > 1000000 0.84 0.83 0.94

[snip]

> Could you show me the exact instruction you entered to compile that
> code using g++? I had trouble with this because simply adding -02

_Letter_ O, not numeral 0.

> isn't recognized by g++.

[snip]

Juding from open source projects, -O2 is probably the most widely used
flag for compiling with g++.

Louis Lavery

unread,

Jun 3, 2003, 6:00:26 PM6/3/03

to

Bjarne Stroustrup <b...@research.att.com> wrote in message
news:HFnqC...@research.att.com...

> /* Standard Container Benchmark
>
> Version 0.9, May 23, 2003
>

> The primary purpose of this benchmark is to show how different standard

> containers are in terms of performance. We have observed that programmers

[snip]

>
> const char header[] =
> "\tarray"
> "\tvector with pointers"
> "\tvector with iterators"
> "\tdeque"
> "\tlist"
> "\tset"
> "\tmultiset";

Um, why not std::string?

Louis.

LLeweLLyn

unread,

Jun 4, 2003, 7:49:17 AM6/4/03

to

"Louis Lavery" <Lo...@devilsChimney.co.uk> writes:

> Bjarne Stroustrup <b...@research.att.com> wrote in message
> news:HFnqC...@research.att.com...
>> /* Standard Container Benchmark
>>
>> Version 0.9, May 23, 2003
>>
>> The primary purpose of this benchmark is to show how different standard
>> containers are in terms of performance. We have observed that programmers
>
> [snip]
>
>
>>
>> const char header[] =
>> "\tarray"
>> "\tvector with pointers"
>> "\tvector with iterators"
>> "\tdeque"
>> "\tlist"
>> "\tset"
>> "\tmultiset";
>
> Um, why not std::string?

[snip]

Perhaps because it is not a container? However I see your point; every
now and then somebody posts really slow code using string to this
ng, and they are told use vector<char> instead.

Alex Vinokur

unread,

Jun 4, 2003, 7:51:45 AM6/4/03

to

Another performance testsuites measured with using C/C++ Program Perfometer.
http://alexvn.freeservers.com/s1/perfometer.html
http://sourceforge.net/projects/cpp-perfometer

1. for-loop vs. for_each algorithm in vector, string, list, set.
http://groups.google.com/groups?selm=3C8DD296.A7304148%40bigfoot.com

2. find() algorithm vs. find() method in vector, list, set, map
http://groups.google.com/groups?selm=3C8C7B97.E4B5CA9C%40bigfoot.com

3. Access to array, vector, basic_string
http://groups.google.com/groups?selm=3C8C619E.D3F557D5%40bigfoot.com

4. switch vs. dynamic_cast
http://groups.google.com/groups?selm=15eacd17.0204300451.209a527f%40posting.google.com

5. operator[] vs. at()
http://groups.google.com/groups?selm=aa8kpg%248oo15%241%40ID-79865.news.dfncis.de

=====================================
Alex Vinokur
mailto:ale...@connect.to

http://mathforum.org/library/view/10978.html

Stephen Howe

unread,

Jun 4, 2003, 9:30:21 AM6/4/03

to

STLPerf1.cpp is the benchmark unaltered
STLPerf2.cpp is the benchmark altered so that deque uses range constuctor
and no copy() and multiset uses the code that John Potter proposed with my
wrapper.

Compiler VC7.

The second set of figures in each case in STLPort 4.5.3, an instresting
comparison.

cl /O2 /GX /ML STLPerf1.cpp:
size array vector with pointers vector with iterators deque list
set multiset
10 1.98 2.03 2.25 6.66 15.38 3.61 6.94
100 1.16 1.14 1.33 4.13 5.33 2.11 3.69
1000 1.14 1.16 1.28 3.47 3.89 1.80 3.30
10000 1.05 1.05 1.16 3.34 3.84 1.92 4.14
100000 1.09 1.08 1.16 3.17 6.30 4.89 7.66
1000000 1.05 1.08 1.13 3.03 7.00 6.55 9.09

cl /O2 /GX /ML /I\STLport-4.5.3\stlport STLPerf1.cpp:
size array vector with pointers vector with iterators deque list
set multiset
10 1.36 1.34 1.36 1.84 3.72 1.00 1.48
100 0.67 0.67 0.67 1.13 1.45 1.00 1.41
1000 0.70 0.69 0.70 0.98 1.28 1.02 1.30
10000 0.72 0.70 0.66 0.91 1.36 1.02 1.59
100000 0.72 0.72 0.72 1.00 5.38 3.70 5.20
1000000 0.77 0.77 0.75 1.02 3.17 3.39 4.47

cl /O2 /GX /ML STLPerf2.cpp:
size array vector with pointers vector with iterators deque list
set multiset
10 1.95 2.03 2.23 6.56 15.48 3.67 7.00
100 1.14 1.14 1.31 4.11 5.33 2.13 3.70
1000 1.16 1.14 1.30 3.42 3.91 1.81 3.30
10000 1.05 1.05 1.16 3.27 3.83 1.94 4.11
100000 1.08 1.08 1.16 3.11 6.31 4.91 7.63
1000000 1.06 1.06 1.13 2.97 6.98 6.58 9.13

cl /O2 /GX /ML /I\STLport-4.5.3\stlport STLPerf2.cpp:
size array vector with pointers vector with iterators deque list
set multiset
10 1.34 1.33 1.34 1.80 3.73 1.03 1.45
100 0.69 0.67 0.67 1.09 1.48 1.03 1.33
1000 0.70 0.69 0.70 0.98 1.27 1.05 1.28
10000 0.72 0.70 0.64 0.91 1.36 1.00 1.56
100000 0.70 0.73 0.70 0.95 5.38 3.70 5.20
1000000 0.77 0.75 0.77 0.97 3.17 3.39 4.45

Stephen Howe

Alex Vinokur

unread,

Jun 4, 2003, 12:31:18 PM6/4/03

to

Here are some performance testsuites for comparative analysis : STL vs. Standard C Library.

The following functions (methods, operators) were measured.

----------------------------------------------------
Standard C Library | STL
----------------------------------------------------
strcpy | operator= (string)
strcat | operator+= (string)
strlen | size () (string)
char* | constructor&destructor (string)
char-array | constructor&destructor (string)
malloc&free | new&delete (string)
(malloc+init)&free | new&delete (string)
----------------------------------------------------

Measurement results:
http://groups.google.com/groups?selm=7dqehp%249jl%241%40nnrp1.dejanews.com
http://groups.google.com/groups?selm=36822F3A.DBFF0811%40tibam.elex.co.il
http://groups.google.com/groups?selm=368734F0.70B29CE2%40tibam.elex.co.il

=====================================
Alex Vinokur
mailto:ale...@connect.to

http://mathforum.org/library/view/10978.html

tom_usenet

unread,

Jun 4, 2003, 1:57:26 PM6/4/03

to

On 1 Jun 2003 17:39:45 -0400, cbar...@ix.netcom.com (Carl Barron)
wrote:

>Bjarne Stroustrup <b...@research.att.com> wrote:
>
>> void vector_pointer_test(element_t* first, element_t* last,
> int number_of_times)
>> {
>> vector<element_t> container(first, last);
>> // &*container.begin() gets us a pointer to the first element
>> sort(&*container.begin(), &*container.end());
>> unique(&*container.begin(), &*container.end());
>> }
> nit pick: vector<T,A>::end() does not need to be dereferenceable. but
>a vector is in a single block so sort(&container.front(),
>&container.back()+1); unique(&constainer.front(), &container.back()+1);
>is more portable:)

The original code should indeed assert on a "safe" debugging stl mode,
such as Dinkumware's or STLport's.

Tom

tom_usenet

unread,

Jun 4, 2003, 1:58:34 PM6/4/03

to

On 3 Jun 2003 05:35:50 -0400, trol...@shaw.ca (Troll_King) wrote:

>Could you show me the exact instruction you entered to compile that
>code using g++? I had trouble with this because simply adding -02
>isn't recognized by g++.

O for Optimization, not 0 for zero. -O2.

>
>With regard to the data types, see C++ Primer 3rd edition page 254 -
>260. The std::string has a different capacity than a large complex
>class, so the full penalty of std::vector isn't realized.

But std::string allocates memory. If it isn't reference counted, then
copying it will have a large overhead - larger than most complex types
that don't allocate dynamic memory.

However, I think libstdc++ 3 does use a reference counted string type,
so a more useful test (that isn't effected by different string
implementations) might be on a large object, such as, say:

struct s
{
int array[100];
};

Tom

Ben Hutchings

unread,

Jun 5, 2003, 3:21:41 AM6/5/03

to

In article <3edd3422$0$18488$cc9e...@news.dial.pipex.com>,

Stephen Howe wrote:
> STLPerf1.cpp is the benchmark unaltered
> STLPerf2.cpp is the benchmark altered so that deque uses range constuctor
> and no copy() and multiset uses the code that John Potter proposed with my
> wrapper.
>
> Compiler VC7.

<snip>

Some hardware details would be useful. In the results you posted, I
noticed a sharp decrease in performance of the node-based containers
between 10,000 and 100,000 elements which is probably due to overflowing
a cache. The point at which this happens will depend on element size
and cache size.

Artem Livshits

unread,

Jun 5, 2003, 3:39:50 AM6/5/03

to

b...@research.att.com (Bjarne Stroustrup) wrote in message news:<HFnqC...@research.att.com>...

> /* Standard Container Benchmark
<snip>

It's interesting to add a couple dimensions to this test. For one
thing, it's interesting to see how std::vector behaves when the size
of the input data is not known in advance. So, given the results of
the original test (MSVC7.1 -O2 -GX -Zi; with the accompanying STL)

size array vec/ptr vec/itr deque list set mset
10 1.22 1.28 1.83 4.20 11.52 2.64 5.45
100 0.72 0.70 1.08 2.56 3.86 1.56 2.91
1000 0.72 0.72 1.02 2.08 2.83 1.31 2.59
10000 0.70 0.70 0.92 2.09 2.81 1.48 2.95
100000 0.67 0.66 0.86 1.95 4.05 3.20 4.94
1000000 0.63 0.66 0.80 1.83 4.44 4.27 5.88

the results of the test where the lines

// vector<element_t> container(first, last);

are substituted with

vector<element_t> container;
copy(first, last, back_inserter(container));

are

size array vec/ptr vec/itr deque list set mset
10 1.22 3.00 3.47 4.19 11.52 2.64 5.44
100 0.70 0.89 1.25 2.64 3.86 1.58 2.89
1000 0.70 0.77 1.05 2.13 2.83 1.33 2.59
10000 0.69 0.83 1.05 2.09 2.81 1.47 2.97
100000 0.67 0.75 0.94 1.97 4.06 3.17 4.92
1000000 0.63 0.73 0.88 1.83 4.44 4.26 5.81

std::vector is now closer to others as it works on the same conditions
as others do. (Array is faster but doesn't count as it fails to
provide the functionality). Yet for large datasets std::vector is
significantly faster.

Another interesting thing to try is to see how much of a performance
hit comes from memory allocation. So I wrote a simple brain-dead
allocator and used it instead the default, and the results of the test
that uses my allocator instead of the default are:

size array vec/ptr vec/itr deque list set mset
10 1.25 1.13 1.72 3.48 4.44 1.13 2.19
100 0.70 0.70 1.09 2.22 2.45 0.92 1.39
1000 0.72 0.72 1.03 1.91 2.03 0.81 1.08
10000 0.70 0.67 0.89 1.69 1.86 0.77 1.22
100000 0.67 0.66 0.83 1.61 2.73 2.00 2.80
1000000 0.64 0.64 0.78 1.59 3.22 3.08 3.78

So the numbers are improved for "allocation-bounded" containers, and
std::set gets really close to vector for small datasets (std::vector
in this test still cheats and takes advantage of known size). For
large datasets std::vector is significantly faster, apparently because
node-based containers have poorer locality.

And finally the combination of the tweaks (my allocator with copying
data into std::vector):

size array vec/ptr vec/itr deque list set mset
10 1.23 1.86 2.44 3.48 4.41 1.09 2.20
100 0.70 0.89 1.25 2.22 2.45 0.92 1.36
1000 0.73 0.83 1.13 1.91 2.03 0.81 1.06
10000 0.70 0.73 0.97 1.69 1.86 0.75 1.23
100000 0.67 0.73 0.91 1.63 2.72 2.02 2.80
1000000 0.64 0.72 0.86 1.59 3.20 3.08 3.80

std::set even manages to outperform std::vector for size 10 datasets
and is basically the same for small to medium size datasets but on
large datasets apparently locality bites it.

The allocator:

class arena_base
{
public:
static void construct_all();
static void reset_all();
protected:
enum { max_size = largest_size * 3 };

explicit arena_base(size_t s);
~arena_base() { /* leak, who cares */ }

void * const *memory() const { return memory_; }

size_t allocated_[2];
size_t all_count_;
private:
arena_base(const arena_base &);
arena_base &operator=(const arena_base &);

void construct();
void reset() { memset(&allocated_, 0, sizeof(allocated_));
all_count_ = 0; }

const size_t obj_size_;
void *memory_[2];

arena_base *next_;
static arena_base *static_head;
};

arena_base *arena_base::static_head = 0;

void arena_base::construct_all()
{
for(arena_base *p = static_head; p; p = p->next_)
p->construct();
}

void arena_base::reset_all()
{
for(arena_base *p = static_head; p; p = p->next_)
p->reset();
}

arena_base::arena_base(size_t s)
: obj_size_(s)
, next_(static_head)
, all_count_(0)
{
memset(&allocated_, 0, sizeof(allocated_));
memset(&memory_, 0, sizeof(memory_));
static_head = this;
}

void arena_base::construct()
{
memory_[0] = operator new(max_size * obj_size_);
memory_[1] = operator new(max_size * obj_size_);
reset();
}

template <class T>
class arena : public arena_base
{
public:
arena() : arena_base(sizeof(T)) {}
T *alloc(size_t n);
void free(T *p, size_t n);
private:
T * const *memory() const { return (T* const
*)(arena_base::memory()); }
};

template <class T>
T *arena<T>::alloc(size_t n)
{
const size_t index = all_count_ & 1;
size_t &allocated = allocated_[index];
T * const result = memory()[index] + allocated;

allocated += n;

if (allocated > max_size)
{
allocated -= n;
throw std::bad_alloc();
}

++all_count_;
return result;
}

template <class T>
void arena<T>::free(T *p, size_t n)
{
// for speed, handle only the block at the end

T * const end = p + n;

if (end == memory()[0] + allocated_[0])
{
allocated_[0] -= n;
}
else if(end == memory()[1] + allocated_[1])
{
allocated_[1] -= n;
}
}

template<class T>
class my_all
{
public:
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
typedef T value_type;
template <class U> struct rebind { typedef my_all other; };

pointer address(reference r) const { return &r; }
const_pointer address(const_reference r) const { return &r; }

my_all() {}
my_all(const my_all &) {}
template<class U>
my_all(const my_all &) {}
template<class U>
my_all& operator=(const my_all &) { return *this; }

void deallocate(pointer p, size_type n) { static_arena.free(p, n);
}

pointer allocate(size_type n) { return static_arena.alloc(n); }

pointer allocate(size_type n, const void *) { return (allocate(n))
};
void construct(pointer p, const T& t) {
std::allocator<T>().construct(p, t); }
void destroy(pointer p) { std::allocator<T>().destroy(p); }
size_type max_size() const { return
std::allocator<T>().max_size(); }

private:
static arena<T> static_arena;
};

template <class T> arena<T> my_all<T>::static_arena;

template<class T, class U>
inline bool operator==(const my_all<T> &, const my_all &)
{
return true;
}

template<class T, class U>
inline bool operator!=(const my_all<T> &t, const my_all &u)
{
return !(t == u);
}

// run should include a call to
// arena_base::reset_all() before
// calling the test function

// while (number_of_times-- > 0)
// {
// arena_base::reset_all();
// f(first,last,number_of_times);
// }

// main should include a call to
// arena_base::construct_all()
// before running tests

Artem Livshits
Brainbench MVP for C++
http://www.brainbench.com

John Potter

unread,

Jun 5, 2003, 10:02:45 AM6/5/03

to

On 4 Jun 2003 13:58:34 -0400, tom_u...@hotmail.com (tom_usenet) wrote:

> However, I think libstdc++ 3 does use a reference counted string type,
> so a more useful test (that isn't effected by different string
> implementations) might be on a large object, such as, say:

> struct s
> {
> int array[100];
> };

Hum. If sizeof(int) is 4, we get 4 * 100 * 1000000 * 2 for the vector.
Will your implementation handle that? Then multiply that by 5 for
multimap, 3 for list, and 5/2 for set. :)

It might be interesting to note the number of tests performed for
each size. It does show the cost of initialization.

10 100 1000 10000 100000 1000000
599999 29999 2000 149 11 1

Here are some big tests.

size array vectP vectI deque list listA set mult multA

int[16]
10 7.59 7.72 7.80 10.14 6.83 7.31 2.16 3.83 4.64
100 4.44 4.38 4.67 6.02 3.48 3.61 1.67 2.80 3.39
1000 3.88 3.83 4.09 5.11 3.12 3.12 1.62 2.42 2.89
10000 4.52 4.64 4.77 5.14 7.36 7.83 4.08 6.34 7.22
100000 4.20 4.14 4.31 4.59 7.53 8.12 5.50 7.31 7.94
1000000 4.38 4.36 4.47 5.03 7.34 11.16 7.39 10.05 12.23

int[32]
10 9.81 9.97 10.00 14.27 53.86 53.17 8.30 15.91 16.83
100 5.77 5.73 6.00 8.28 11.23 11.45 4.59 8.50 9.38
1000 6.17 6.17 6.50 8.02 8.06 8.48 3.69 7.19 8.05
10000 7.34 7.64 7.75 8.23 9.77 10.78 5.94 9.91 11.19
100000 6.92 7.05 7.05 7.49 9.36 10.12 6.88 9.92 11.02
1000000 7.70 7.95 8.05 9.65 12.20 13.98 12.28 15.81 17.95

int[64]
10 17.00 20.58 21.12 26.98 54.73 54.55 8.92 17.05 19.12
100 12.17 12.30 13.44 16.05 11.39 11.88 5.22 9.25 10.41
1000 16.78 16.77 17.80 17.19 8.88 10.47 4.49 8.77 10.84
10000 16.08 16.13 16.91 16.72 10.34 11.92 6.56 10.55 12.70
100000 14.34 14.61 14.97 15.06 10.02 11.44 7.28 10.62 12.50

int[128]
10 23.23 29.06 32.08 41.20 53.89 56.77 9.47 18.00 20.27
100 18.03 18.17 19.25 25.09 12.03 13.11 5.30 9.91 11.17
1000 28.55 28.72 30.14 30.28 10.88 14.34 5.62 10.66 14.17
10000 26.62 26.52 28.05 28.23 11.73 14.50 7.62 12.30 15.14
100000 23.36 23.55 24.09 24.94 11.23 13.59 8.17 12.42 14.88

int[256]
10 36.23 49.02 44.84 55.44 55.81 59.34 10.27 19.55 22.53
100 34.86 35.06 37.44 44.25 19.39 21.94 6.74 15.66 18.88
1000 51.95 52.45 54.23 51.02 16.33 22.06 8.58 16.36 22.14
10000 47.69 47.78 49.17 47.25 15.52 19.95 9.61 16.44 21.05
100000 41.58 41.61 42.67 42.19 14.66 18.30 10.45 16.92 20.05

And just for fun a big dynamic allocator with a copy/swap op=.
struct { int* /*16*/ double* /*8*/ }

10 187.03 165.92 165.09 168.41 27.50 38.31 12.61 25.31 37.08
100 111.08 100.39 100.34 101.14 14.20 19.58 7.34 13.78 19.59
1000 94.47 87.45 87.25 88.09 13.02 16.73 6.89 12.70 16.88
10000 88.53 83.87 83.73 84.73 16.56 20.34 10.75 17.25 21.09
100000 77.56 73.88 73.70 74.88 16.45 19.89 13.95 18.59 21.38

On 29 May 2003 14:29:36 -0400, b...@research.att.com (Bjarne Stroustrup)
wrote:

> The primary purpose of this benchmark is to show how different standard

> containers are in terms of performance. We have observed that programmers

> often use sets, multisets, deques in the situations where sorted vectors
> are the optimal solutions. Similarly, programmers often choose lists simply
> because they do a few insertions and deletions without knowing that vectors
> are more efficient at that for small containers.

Is it serving its purpose?

John

John Potter

unread,

Jun 5, 2003, 10:03:12 AM6/5/03

to

On 4 Jun 2003 09:30:21 -0400, "Stephen Howe"
<NOSPAM...@dial.pipex.com> wrote:

> The second set of figures in each case in STLPort 4.5.3, an instresting
> comparison.

> cl /O2 /GX /ML STLPerf1.cpp:

> size array vectorp vectori deque list set multiset

> 10 1.98 2.03 2.25 6.66 15.38 3.61 6.94
> 100 1.16 1.14 1.33 4.13 5.33 2.11 3.69
> 1000 1.14 1.16 1.28 3.47 3.89 1.80 3.30
> 10000 1.05 1.05 1.16 3.34 3.84 1.92 4.14
> 100000 1.09 1.08 1.16 3.17 6.30 4.89 7.66
> 1000000 1.05 1.08 1.13 3.03 7.00 6.55 9.09

> cl /O2 /GX /ML /I\STLport-4.5.3\stlport STLPerf1.cpp:

> 10 1.36 1.34 1.36 1.84 3.72 1.00 1.48
> 100 0.67 0.67 0.67 1.13 1.45 1.00 1.41
> 1000 0.70 0.69 0.70 0.98 1.28 1.02 1.30
> 10000 0.72 0.70 0.66 0.91 1.36 1.02 1.59
> 100000 0.72 0.72 0.72 1.00 5.38 3.70 5.20
> 1000000 0.77 0.77 0.75 1.02 3.17 3.39 4.47

The most interesting point may be that the array case shows the
difference in the quality of the algorithms. Deque shows more
difference which would be the result of the iterators. The
other containers are algorithms again.

John

Daniel Spangenberg

unread,

Jun 5, 2003, 10:10:57 AM6/5/03

to

Hello, Alex Vinokur,

Alex Vinokur schrieb:

> Here are some performance testsuites for comparative analysis : STL vs. Standard C Library.
>

> [snip]

These list of performance tests are quite extensive but at least some of them
seem rather
out-dated now (I remember the year numbers of 1995/1998 or so). Did you also
repeat these
measurements under more modern compilers?

Greetings from Bremen,

Daniel Spangenberg

Bo Persson

unread,

Jun 5, 2003, 4:52:42 PM6/5/03

to

"Artem Livshits" <a_e_li...@yahoo.com> skrev i meddelandet
news:6dc54885.03060...@posting.google.com...

No, *this* is cheating!

You ask vector to give up one of its natural advantages, that it
actually *can* adapt to a known input. When you do that, its much like
requiring basket ball players to all be 6 feet tall. Is it cheating to
be 7 feet, and use it?

Bo Persson
bo...@telia.com

Artem Livshits

unread,

Jun 6, 2003, 7:48:09 AM6/6/03

to

"Bo Persson" <bo...@telia.com> wrote in message
news:2uMDa.13442$dP1....@newsc.telia.net...

>
> "Artem Livshits" <a_e_li...@yahoo.com> skrev i meddelandet
> news:6dc54885.03060...@posting.google.com...
> > b...@research.att.com (Bjarne Stroustrup) wrote in message
> news:<HFnqC...@research.att.com>...
> >
> > > /* Standard Container Benchmark
> > <snip>
> >
> > It's interesting to add a couple dimensions to this test.
>

> No, *this* is cheating!
>
> You ask vector to give up one of its natural advantages

Well, then, why don't we just compare how fast we can fill a container? Do
something like

void vector_test(element_t* first, element_t* last, int number_of_times)
{
vector<element_t> container(first, last);
}

//...

void set_test(element_t* first, element_t* last, int number_of_times)
{
set<element_t> container(first, last);
}

according to your logic this must the fairest test: std::vector has "a
natural advantage" over a std::set in that it doesn't have to sort &
unique the input when it's filled (and the code looks the same so it must
be the fairest test indeed).

Anyway, I didn't say "use instead", I said "an additional dimension".
It's a quite valid real-life usage example; besides "to compare container
performance" kind of implies "when they are doing the same amount of
work" unless stated otherwise.

Artem Livshits
Brainbench MVP for C++
http://www.brainbench.com

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

John Potter

unread,

Jun 6, 2003, 12:29:44 PM6/6/03

to

On 5 Jun 2003 03:39:50 -0400, a_e_li...@yahoo.com (Artem Livshits)
wrote:

> It's interesting to add a couple dimensions to this test. For one
> thing, it's interesting to see how std::vector behaves when the size
> of the input data is not known in advance. So, given the results of
> the original test (MSVC7.1 -O2 -GX -Zi; with the accompanying STL)

> size array vec/ptr vec/itr deque list set mset
> 10 1.22 1.28 1.83 4.20 11.52 2.64 5.45
> 100 0.72 0.70 1.08 2.56 3.86 1.56 2.91
> 1000 0.72 0.72 1.02 2.08 2.83 1.31 2.59
> 10000 0.70 0.70 0.92 2.09 2.81 1.48 2.95
> 100000 0.67 0.66 0.86 1.95 4.05 3.20 4.94
> 1000000 0.63 0.66 0.80 1.83 4.44 4.27 5.88

> // vector<element_t> container(first, last);

> are substituted with

> vector<element_t> container;
> copy(first, last, back_inserter(container));

This is likely to copy twice as many times during construction.

> size array vec/ptr vec/itr deque list set mset
> 10 1.22 3.00 3.47 4.19 11.52 2.64 5.44
> 100 0.70 0.89 1.25 2.64 3.86 1.58 2.89
> 1000 0.70 0.77 1.05 2.13 2.83 1.33 2.59
> 10000 0.69 0.83 1.05 2.09 2.81 1.47 2.97
> 100000 0.67 0.75 0.94 1.97 4.06 3.17 4.92
> 1000000 0.63 0.73 0.88 1.83 4.44 4.26 5.81

> std::vector is now closer to others as it works on the same conditions
> as others do. (Array is faster but doesn't count as it fails to
> provide the functionality). Yet for large datasets std::vector is
> significantly faster.

It helps to know what the test does.
1. Create the data structure. O(N)
2. Sort the data structure. O(NlgN) or O(N*N) for small N
3. Remove the duplicates. O(N)
For small N, the construction is a major part of the total. As size
increases, the sort dominates and vector shines. Interesting to note
that the double copy for vector still beats deque.

The allocator changes are also only involved in the build phase. Again
showing something for small size and nothing for large size. The sort,
build and remove are in parallel for set, but the sort till dominates.

It is a simple fact that creating a sorted collection is always fastest
when filling a vector and then sorting. There is no competition.

John

Stephen Howe

unread,

Jun 6, 2003, 7:07:00 PM6/6/03

to

> Some hardware details would be useful.

Sorry, my omission. (Moderator please allow this through, I should have
posted this originally with results).

Windows 2K
256k Mb Ram, 2Ghz Intel Processor,
Level 2 Cache: 512K
Level 1 Data Cache: 8K

> In the results you posted, I
> noticed a sharp decrease in performance of the node-based containers
> between 10,000 and 100,000 elements which is probably due to overflowing
> a cache.

Hrmmm. 100,000 * (sizeof(double) + 2 pointers for list or 3 pointers for
other containers exceeds L2 cache. That is probably it.

Stephen Howe

Artem Livshits

unread,

Jun 7, 2003, 5:45:28 AM6/7/03

to

John Potter <jpo...@falcon.lhup.edu> wrote in message news:<p0evdvg0a6uu5vmt6...@4ax.com>...

> On 5 Jun 2003 03:39:50 -0400, a_e_li...@yahoo.com (Artem Livshits)
> wrote:
>
> > For one
> > thing, it's interesting to see how std::vector behaves when the size
> > of the input data is not known in advance.
>

> This is likely to copy twice as many times during construction.
>

Yes, but there is a tricky point that may be important for large
datasets: it doesn't *just* copy twice as many, it copies and touches
the whole dataset. There is a difference between copying same 100
bytes 2000 times and copying 100000 bytes 2 times.

Sorting would probably shadow the difference that would arise from
that, but it's interesting to see if it would.

> For small N, the construction is a major part of the total. As size
> increases, the sort dominates and vector shines.

Well, sort of. The full time spent in vector_test (original test) is
something like C1*N + C2*(NlgN) + C3*N where C1 is the constant the
construction, C2 is the constant for sorting and C3 is the constant
for unique-ing. Depending on what those constants really are we can
get all kinds of results, but I would expect C1 to be at least not
greater than C3 and C2, and most likely it's less. So even assuming
that they are equal for N=10 the construction should be responsible
for about 19% of the time (and most likely less). Not something I
would call "a major part of the total".

> Interesting to note
> that the double copy for vector still beats deque.

I looked at the implementation of std::deque I used in the tests; the
std::deque is basically an array of pointers to chunks of 2 elements
(i.e. the size of the array is only twice as less than the size of the
vector and it has to grow as well, so I'm not surprised).

> The allocator changes are also only involved in the build phase. Again
> showing something for small size and nothing for large size. The sort,

> build and remove are in parallel for set, but the sort still dominates.

Ok, to sort out any confusion about the build phase I modified
array_test to include only sorting, so it became

void run_my_array_test(element_t* first, element_t* last, int
number_of_times)
{
const int N = number_of_times;
element_t **arrays = new element_t * [N];

while (number_of_times-- > 0)
{
arrays[number_of_times] = new element_t[last - first];
copy(first, last, arrays[number_of_times]);
}
number_of_times = N;

start_timer();

while (number_of_times-- > 0)
sort(arrays[number_of_times], arrays[number_of_times] + (last -
first));

result_times.push_back(timer());

number_of_times = N;
while (number_of_times-- > 0)
delete[] arrays[number_of_times];

delete[] arrays;
}

and the results are (using my custom allocator):

size array set mset
10 0.98 1.09 2.16
100 0.70 0.94 1.39
1000 0.70 0.80 1.06
10000 0.64 0.77 1.20
100000 0.61 2.00 2.80
1000000 0.58 3.08 3.89

It's easy to see that both std::set and std::multiset have pretty
steady overhead factor over std::sort for sizes 10-10000 and that it
jumps significantly for larger sizes. Now the complexity should be the
same and there shouldn't be noticeable allocation overhead, so there
must be something else.

The following test attracted my attention:

void run_my_memcpy_test(element_t* first, element_t* last, int
number_of_times)
{
element_t *array = new element_t[last - first];

start_timer();

while (number_of_times-- > 0)
memcpy(array, first, (last - first) * sizeof(*first));

result_times.push_back(timer());

delete[] array;
}

// the call is
// run_my_memcpy_test(buffer, buffer_end, largest_size * 200 /
length);
// so that it copies the same amount of bytes for each size

the results are:

size memcpy
10 0.52
100 0.28
1000 0.28
10000 0.39
100000 1.86
1000000 1.88

So, N=10: loop overhead (memcpy is generated as a native assembly
instruction). N=100,1000: no noticeable cache pressure. N=10000: some
cache pressure. N=100000 and higher: 6.5+ times slower than N=1000 and
lower. (So there *is* a difference between copying same 100 bytes 2000
times and copying 100000 bytes 2 times)

So I would expect, like I said originally, that poor cache locality is
responsible for poor performance of std::set and std::multiset on
large datasets. I think that a T-Tree with an appropriate node size
should mitigate this problem, but I don't have an implementation so I
can't check if it does and how well.

> It is a simple fact that creating a sorted collection is always fastest
> when filling a vector and then sorting. There is no competition.

Yes, and results would probably be even more favorable to std::vector
if there were no duplicates. Yet it's interesting to see when it
starts to matter (and how much).

It's worth mentioning though that std::set has a simpler invariant as
a sorted container than std::vector (etc.): using std::vector (etc.)
as a sorted container would require it to be in 2 states: "writeable
unsorted" and "read-only sorted" (run-time state); while just the fact
it's an std::set guarantees that the container is always sorted
(compiler-time guarantee).

So for example, if there is a function

void filterLineNumbers(const std::vector<int> &filter)
{
assert(is_sorted(filter));

// use binary_search
}

// or

void filterLineNumbers(const std::set<int> &filter)
{
// no need for assert -- just works

// use filter.count(x)
}

and it's known that the filter is filled only once and then used for
searching, from the performance prospective the std::vector variant
should be better, however the std::set variant should be easier to
call correctly.

Artem Livshits
Brainbench MVP for C++
http://www.brainbench.com

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Alex Vinokur

unread,

Jun 7, 2003, 5:50:13 AM6/7/03

to

"Daniel Spangenberg" <d...@bdal.de> wrote in message news:3EDEE71B...@bdal.de...

> Hello, Alex Vinokur,
>
> Alex Vinokur schrieb:
>
> > Here are some performance testsuites for comparative analysis : STL vs. Standard C Library.
> >
> > [snip]
>
> These list of performance tests are quite extensive but at least some of them
> seem rather
> out-dated now (I remember the year numbers of 1995/1998 or so). Did you also
> repeat these
> measurements under more modern compilers?
>
> Greetings from Bremen,
>
> Daniel Spangenberg
>

Hello, Daniel Spangenberg.
Thanks for your question.

I see this problem as follows:

1. Performance measurement tools.
To have a performance measurement tool I have developed C/C++ Perfometer.
The latest version (2.2.5) seems to be more or less portable.

The Perfometer enables to get C/C++ program performance for any metrics
(for instance : clocks, uclocks, rusage-metrics, metrics defined by user etc.).

Each user can use it.

2. Set of testsuites.
Next important step is to define (build) a set of performance testsuites.
(I think) It is desirable _that an unified (standard) set of testsuites from the performance point of view_ should be created.

This set should be used to evaluate performance features (properties) of
data structures, compilers, operating systems, processors, etc.

Of course, this set might be updated from time to time.

I have not done it yet.

3. Metrics.
Also it is worth defining metrics which are of interest to users.
The rusage utility metrics might be used.

4. Environment (set of environments)
A set of relevant environments (compilers, operating systems, processors, etc) should be defined from the performance point of view
too.

5. Performing measurements.
I am working with GNU gcc/g++ 3.2 Windows 2000.
So, I can perform the measurements in this environment.
However first of all I would like to make progress with Paragraph 2 (Set of testsuites).

Regards,
==========================================
Alex Vinokur
mailto:ale...@connect.to
http://www.simtel.net/pub/oth/19088.html
http://sourceforge.net/users/alexvn
==========================================

John Potter

unread,

Jun 8, 2003, 7:11:18 PM6/8/03

to

On 7 Jun 2003 05:45:28 -0400, a_e_li...@yahoo.com (Artem Livshits)
wrote:

> John Potter <jpo...@falcon.lhup.edu> wrote in message news:<p0evdvg0a6uu5vmt6...@4ax.com>...

> > For small N, the construction is a major part of the total. As size

> > increases, the sort dominates and vector shines.

> Well, sort of. The full time spent in vector_test (original test) is
> something like C1*N + C2*(NlgN) + C3*N where C1 is the constant the
> construction, C2 is the constant for sorting and C3 is the constant
> for unique-ing. Depending on what those constants really are we can
> get all kinds of results, but I would expect C1 to be at least not
> greater than C3 and C2, and most likely it's less. So even assuming
> that they are equal for N=10 the construction should be responsible
> for about 19% of the time (and most likely less). Not something I
> would call "a major part of the total".

If nothing else, this test shows that nothing works as expected. If
you do not measure it, you have no idea. Here is some data for vector
using N = maxSize / size. The first is range construct and the second
is build with copy to back_inserter. Note the major role of
construction for small size. Also note that it is not linear.

Build only
1 359 703
10 47 328
100 32 125
1000 15 47
10000 0 63
100000 62 141
1000000 63 124

Build and sort
1 453 782
10 156 437
100 250 360
1000 437 485
10000 562 625
100000 766 828
1000000 875 953

Build, sort, unique
1 453 812
10 172 469
100 266 359
1000 453 500
10000 563 640
100000 782 859
1000000 906 985

If you do measure it, you do not know what you measured. :)

John

Alex Vinokur

unread,

Jun 21, 2003, 8:24:32 PM6/21/03

to

"Daniel Spangenberg" <d...@bdal.de> wrote in message news:3EDEE71B...@bdal.de...

> Hello, Alex Vinokur,
>
> Alex Vinokur schrieb:
>
> > Here are some performance testsuites for comparative analysis : STL vs. Standard C Library.
> >
> > [snip]
>
> These list of performance tests are quite extensive but at least some of them
> seem rather
> out-dated now (I remember the year numbers of 1995/1998 or so). Did you also
> repeat these
> measurements under more modern compilers?
>
> Greetings from Bremen,
>
> Daniel Spangenberg
>
>
>

Here are some C/C++ Performance Tests performed with using C/C++ Program Perfometer.

Everyone can performe these tests and his own test suites using that Perfometer
http://sourceforge.net/projects/cpp-perfometer (Package Name: PerfoTester).

In order to measure his own tests a tester should add them to
user-defined program files (of C/C++ Program Perfometer)

----------------------------------------------------
NOTE about time accuracy on Windows 2000.
There is some problem with measuring elapsed time on Windows 2000. See relevant discussions :
http://groups.google.com/groups?th=f073ad9b3aca2a6a
http://groups.google.com/groups?th=ccb93720088dc36d
http://groups.google.com/groups?th=18bd98531080cb5d
So, on Windows 2000 :
* one shouldn't use uclock(),
* if rusage() is used to measure performance
then it is necessary to set up large enough number of iterations (program runs).
----------------------------------------------------

C/C++ Performance Tests
=======================
Using C/C++ Program Perfometer
http://sourceforge.net/projects/cpp-perfometer
http://alexvn.freeservers.com/s1/perfometer.html

C/C++ PerfoTester Release: 1.0
http://sourceforge.net/projects/cpp-perfometer (Package Name: PerfoTester)

Environment
-----------
Windows 2000 Professional
CYGWIN_NT-5.0 1.3.22(0.78/3/2)
Intel(R) Celeron(R) CPU 1.70 GHz
GNU gcc/g++ version 3.2 20020927 (prerelease)
Compilation : No optimization

Testsuites
----------

1. C-string initialization
Raw Log : http://groups.google.com/groups?selm=bcs4nc$lsdbq$1...@ID-79865.news.dfncis.de

2. string initialization (C vs. STL)
Raw Log : http://groups.google.com/groups?selm=bcs72j$ljpau$8...@ID-79865.news.dfncis.de

3. string length (C vs. STL)
Raw Log : http://groups.google.com/groups?selm=bcs72e$ljpau$7...@ID-79865.news.dfncis.de

4. string assignment and addition (C vs. STL) - 1
Raw Log : http://groups.google.com/groups?selm=bcs724$ljpau$6...@ID-79865.news.dfncis.de

5. string assignment and addition (C vs. STL) - 2
Raw Log : http://groups.google.com/groups?selm=bcs71t$ljpau$5...@ID-79865.news.dfncis.de

6. find method vs. find algorithm
Raw Log : http://groups.google.com/groups?selm=bcs71j$ljpau$4...@ID-79865.news.dfncis.de

7. for-loop vs. for_each
Raw Log : http://groups.google.com/groups?selm=bcs715$ljpau$3...@ID-79865.news.dfncis.de

8. access to element
Raw Log : http://groups.google.com/groups?selm=bcs70m$ljpau$2...@ID-79865.news.dfncis.de

9. calling functions : templated vs. non-templated
Raw Log : http://groups.google.com/groups?selm=bcs70m$ljpau$2...@ID-79865.news.dfncis.de

Summary test results
--------------------

HTML Versions : http://article.gmane.org/gmane.comp.gcc.g++.general/101
http://mail.gnu.org/archive/html/help-gplusplus/2003-06/msg00003.html
news://news.gmane.org/bd13qj$s86$1...@main.gmane.org (via NNTP newsreader)
news://msnnews.msn.com/uI#FPx8ND...@TK2MSFTNGP11.phx.gbl (via NNTP newsreader)

Unit of Measurement = milliseconds/per-times-value
== |=========================================================================|
| : : Size performance : Per- |
No.| Method : Data Type :--------------------: times- |
| : : 10 : 100 : 1000 : value |
== |=========================================================================|
1. | init (char* x = y) : char* : 4 : 4 : 4 : 1000000 |
| init (char[] x = y) : char[] : 5 : 232 : 402 : |
-- |-------------------------------------------------------------------------|
2. | malloc & free : char[] : 132 : 162 : 162 : 100000 |
| (malloc+strcpy) & free : char[] : 137 : 179 : 274 : |
| (malloc+memset) & free : char[] : 161 : 187 : 217 : |
| ctor & dtor : string : 30 : 30 : 30 : |
| new & delete : string : 223 : 274 : 892 : |
-- |-------------------------------------------------------------------------|
3. | strlen : char[] : 222 : 693 : 5410 : 1000000 |
| string.size() : string : 16 : 15 : 15 : |
-- |-------------------------------------------------------------------------|
4. | strcpy : char[] : 6 : 22 : 147 : 100000 |
| operator= : string : 3 : 3 : 3 : |
| strcat (with cutting off) : char[] : 10 : 26 : 152 : |
| operator+= (with erase) : string : 100 : 112 : 184 : |
| operator+ (with erase) : string : 234 : 234 : 508 : |
-- |-------------------------------------------------------------------------|
5. | accumulated strcat : char[] : 1090 : 1194 : 1277 : 10000 |
| (with cutting off) : : : : : |
| accumulated operator+= : string : 10 : 12 : 20 : |
| (with erase) : : : : : |
| accumulated operator+ : string : 6311 : 6993 : 7280 : |
| (with erase) : : : : : |
-- |-------------------------------------------------------------------------|
6. | find algorithm : vector<int> : 2 : 8 : 69 : 10000 |
| find algorithm : list<int> : 2 : 15 : 142 : |
| find algorithm : set<int> : 3 : 22 : 217 : |
| find method : set<int> : 2 : 4 : 4 : |
| find method : map<int> : 2 : 4 : 4 : |
-- |-------------------------------------------------------------------------|
7. | for-loop : vector<uint> : 13 : 124 : 1225 : 10000 |
| for-loop : string : 14 : 120 : 1189 : |
| for-loop : list<uint> : 15 : 133 : 1330 : |
| for-loop : set<uint> : 13 : 129 : 1311 : |
| for_each : vector<uint> : 10 : 92 : 915 : |
| for_each : string : 10 : 93 : 917 : |
| for_each : list<uint> : 9 : 78 : 773 : |
| for_each : set<uint> : 12 : 104 : 1070 : |
-- |-------------------------------------------------------------------------|
8. | operator[] : char[] : 5 : 5 : 4 : 1000000 |
| operator[] : vector<char> : 90 : 90 : 90 : |
| operator[] : string : 34 : 34 : 33 : |
| at method : vector<char> : 222 : 222 : 223 : |
| at method : string : 32 : 33 : 33 : |
-- |-------------------------------------------------------------------------|
9. | templated function : int : 16 : 1000000 |
| ordinary function : int : 5 : |
| templated function : string : 344 : 345 : 345 : |
| ordinary function : string : 344 : 345 : 350 : |
== |=========================================================================|

Alex Vinokur
mailto:ale...@connect.to
http://mathforum.org/library/view/10978.html

20 June 2003

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]