Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Use of std::vector slower than arrays?

3 views
Skip to first unread message

mike3

unread,
Nov 12, 2007, 6:40:49 PM11/12/07
to
Hi.

I was writing a bignum package in C++. I noticed that a version of an
addition routine I had written using std::vector was a lot slower than
one using simple "C-style" arrays, and this
raises the question: is using std::vector slower than using arrays???

This is the code for the simple C-style add function using C-style
arrays and pointers.

// Element-by-element array add.
Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

This is the code for the one using std::vector:

// Addition using std::vector.
Digit vecadd(std::vector<Digit> *a,
const std::vector<Digit> *b,
const std::vector<Digit> *c,
int len)
{
// set up iterators
std::vector<Digit>::iterator ai(a->begin());
std::vector<Digit>::const_iterator bi(b->begin());
std::vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

(Note: These routines assume all inputs are of the same length.
That's because they're just for testing purposes. Also, "Digit"
is a machine-word length type, for my machine it'd be 32 bits
long, so each one represents 32 bits of the number. The ordering
is little-endian, so the least-significant digit comes first.)

The timings show the simple array-based approach takes 4
seconds for 100 million operations on 128 bit numbers on my
machine. The one with std::vector, though, takes a whopping
40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?
Is there any way to get around that? I was told(*) it is better to use
std::vector than using arrays, as it encapsulates the low-level
pointer stuff, making the software less prone to bugs. But in this
program, I need speed.

(*) See this post:
http://groups.google.com/group/comp.lang.c++/msg/a66278b77597d648

Michael

unread,
Nov 12, 2007, 6:59:31 PM11/12/07
to
> I was writing a bignum package in C++. I noticed that a version of an
> addition routine I had written using std::vector was a lot slower than
> one using simple "C-style" arrays, and this
> raises the question: is using std::vector slower than using arrays???

Likely there is not a general answer other than "measure it." (With
the possible caveat "the standard says nothing about speed.")

In practical terms, I've noticed that debug builds under Visual C++
are dog-slow with STL-intensive tasks, presumably because it does all
sorts of extra checking to make sure you aren't writing past the end
of a vector or that kind of thing. The release builds are quite fast
though.

Michael

Mark P

unread,
Nov 12, 2007, 7:09:43 PM11/12/07
to
mike3 wrote:
> Hi.
>
> I was writing a bignum package in C++. I noticed that a version of an
> addition routine I had written using std::vector was a lot slower than
> one using simple "C-style" arrays, and this
> raises the question: is using std::vector slower than using arrays???
>

Are you using an optimized build on a reasonably recent / decent compiler?

It's hard to say given what little you've shown us. We can't even see
how you're handling memory allocation. Try posting a complete program
(or two) that demonstrates the slowdown you've observed.

mike3

unread,
Nov 12, 2007, 7:33:14 PM11/12/07
to

Ooh. I was using the Microsoft C++ compiler, by the way. I measured,
like I said in the post, and it came out to 4 seconds for 100,000,000
operations on a simple C-like array of "Digit" 128 bits long, while an
std::vector of the same length and the same # of operations came out
to a whopping 40, an utterly atrocious amount of time! What can be
done? Should I write my own class to encapsulate the array that allows
for both "safe", slow access (with all the bounds-checking and stuff)
and "unsafe", fast access (ie. lets you obtain a pointer directly to
the contents of the array and use it)? Or is there still a way to use
std::vector for this application?

mike3

unread,
Nov 12, 2007, 8:16:34 PM11/12/07
to
On Nov 12, 5:09 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
wrote:
> mike3 wrote:
<snip>

> It's hard to say given what little you've shown us. We can't even see
> how you're handling memory allocation. Try posting a complete program
> (or two) that demonstrates the slowdown you've observed.

Well, here's a complete testing program, then.

---------------------------------------------------

// Performance comparison of std::vector vs C-style array for
// bignum arithmetic.
#include <iostream>
#include <iomanip>
#include <vector>
#include <time.h>

using namespace std;

typedef unsigned long Digit; // should be equal to the machine's
wordsize

// Note: All arrays are stored in little-endian format.

// Test routine: Adds two digit strings together, stored as C arrays.


Digit arradd(Digit *a, Digit *b, Digit *c, int len)
{
// set up pointers
Digit *ap(a), *bp(b), *cp(c);

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bp + *cp + carry); // add b/c's digits
carry = carry ? (tmp <= *bp) : (tmp < *bp); // check overflow/
carry
*ap = tmp; // set result
ap++; bp++; cp++; // increment pointers
}

return(carry);
}

// Test routine: Adds two digit strings together, stored with vector.
Digit vecadd(vector<Digit> *a,
const vector<Digit> *b,
const vector<Digit> *c,


int len)
{
// set up iterators

vector<Digit>::iterator ai(a->begin());

vector<Digit>::const_iterator bi(b->begin());

vector<Digit>::const_iterator ci(c->begin());

// loop through the digits
Digit carry(0);
while(len--)
{
Digit tmp(*bi + *ci + carry); // add b/c's digits
carry = carry ? (tmp <= *bi) : (tmp < *bi); // check overflow/
carry
*ai = tmp; // set result
ai++; bi++; ci++; // increment iterators
}

return(carry);
}

// The main routine.
int main()
{
// Create the digit arrays and vectors.
Digit a[4], b[4];
vector<Digit> av, bv;

// Set up the vectors' memory storage.
av.reserve(4);
bv.reserve(4);

// Set the test values.
a[0] = 0x4B9F0101UL; av.push_back(0x4B9F0101UL); // LSD
a[1] = 0x56092310UL; av.push_back(0x56092310UL);
a[2] = 0x56238413UL; av.push_back(0x56238413UL);
a[3] = 0x44042121UL; av.push_back(0x44042121UL); // MSD
b[0] = 0x4B9F0101UL; bv.push_back(0x4B9F0101UL); // LSD
b[1] = 0x56092310UL; bv.push_back(0x56092310UL);
b[2] = 0x56238413UL; bv.push_back(0x56238413UL);
b[3] = 0x44042121UL; bv.push_back(0x44042121UL); // MSD

// Now run the tests.
time_t startt, endt;
clock_t startcpu, endcpu;

// Test 1
cout << "Test 1: 100 million 128-bit additions using C arrays..."
<< endl;

startt = time(0);
startcpu = clock();

for(int i(0);i<100000000;i++)
arradd(a, a, b, 4); // a = (a + b) % (1<<128)

endcpu = clock();
endt = time(0);

cout << "Test 1 complete." << endl;
cout << "CPU time elapsed: " << (endcpu - startcpu) /
CLOCKS_PER_SEC << " seconds" << endl;
cout << "Wall time elapsed: " << difftime(endt, startt) << "
seconds" << endl;;
cout << "Final sum: " << hex << uppercase << setfill('0');
for(int i(3);i>=0;i--)
cout << setw(8) << a[i] << " ";
cout << dec << nouppercase << endl;
cout << endl;

// Test 2
cout << "Test 2: 100 million 128-bit additions using vector..." <<
endl;

startt = time(0);
startcpu = clock();

for(int i(0);i<100000000;i++)
vecadd(&av, &av, &bv, 4); // av = (av + bv) % (1<<128)

endcpu = clock();
endt = time(0);

cout << "Test 2 complete." << endl;
cout << "CPU time elapsed: " << (endcpu - startcpu) /
CLOCKS_PER_SEC << " seconds" << endl;
cout << "Wall time elapsed: " << difftime(endt, startt) << "
seconds" << endl;
cout << "Final sum: " << hex << uppercase << setfill('0');
for(int i(3);i>=0;i--)
cout << setw(8) << av.at(i) << " ";
cout << dec << nouppercase << endl;
cout << endl;

// Done.
return 0;
}

---------------------------------------------------

Kai-Uwe Bux

unread,
Nov 12, 2007, 8:23:19 PM11/12/07
to
mike3 wrote:

> On Nov 12, 4:59 pm, Michael <mchlg...@aol.com> wrote:
>> > I was writing a bignum package in C++. I noticed that a version of an
>> > addition routine I had written using std::vector was a lot slower than
>> > one using simple "C-style" arrays, and this
>> > raises the question: is using std::vector slower than using arrays???
>>
>> Likely there is not a general answer other than "measure it." (With
>> the possible caveat "the standard says nothing about speed.")
>>
>> In practical terms, I've noticed that debug builds under Visual C++
>> are dog-slow with STL-intensive tasks, presumably because it does all
>> sorts of extra checking to make sure you aren't writing past the end
>> of a vector or that kind of thing. The release builds are quite fast
>> though.
>>
>> Michael
>
> Ooh. I was using the Microsoft C++ compiler, by the way.

What about the optimization settings?

> I measured,
> like I said in the post, and it came out to 4 seconds for 100,000,000
> operations on a simple C-like array of "Digit" 128 bits long, while an
> std::vector of the same length and the same # of operations came out
> to a whopping 40, an utterly atrocious amount of time! What can be
> done? Should I write my own class to encapsulate the array that allows
> for both "safe", slow access (with all the bounds-checking and stuff)
> and "unsafe", fast access (ie. lets you obtain a pointer directly to
> the contents of the array and use it)? Or is there still a way to use
> std::vector for this application?

I venture the conjecture that your std::vector does bounds checking. Chances
are that it will disappear if you define NDEBUG and turn on optimization.


Best

Kai-Uwe Bux

Daniel T.

unread,
Nov 12, 2007, 8:33:30 PM11/12/07
to
mike3 <mike...@yahoo.com> wrote:

> The timings show the simple array-based approach takes 4
> seconds for 100 million operations on 128 bit numbers on my
> machine. The one with std::vector, though, takes a whopping
> 40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?

Because you did not optimize the build and/or did not remove the debug
checks from std::vector. Different compilers do things differently in
this regard. Check your documentation.

mike3

unread,
Nov 12, 2007, 9:13:35 PM11/12/07
to
On Nov 12, 6:33 pm, "Daniel T." <danie...@earthlink.net> wrote:

Well, I tried it with maximum optimization on, and that
got the time down to a nice 6 seconds, however the array-
based routine went down to a smoking-fast 2, so there's still a
3x performance gain from array use. I tried turning off the
debug, but that didn't seem to change it. Would it be a good
idea then to write my own encapsulation of the digit array,
with both "safe", slow access modes and "unsafe", fast ones
(ie. you could request a pointer to the digit array contained
in the thing and use it like a C-style array), then use this for
the time-critical areas (the bignum package) and std::vector
for everything else? Or would that just be silly and a waste
of work?

Jim Langston

unread,
Nov 12, 2007, 9:49:59 PM11/12/07
to
"mike3" <mike...@yahoo.com> wrote in message
news:1194910849.3...@z24g2000prh.googlegroups.com...

I did some tests on my system and had dismal results also, but in release
things are a bit better, but not much.

Using your algorithm the array based was 6 seconds, the vector based was 29
seconds.

Using vectors with the array based algorithm
arradd(&av[0], &av[0], &bv[0], 4);
is 16 seconds.

I tried all the optmizations I could find but couldn't get an improvement
over these values.


mike3

unread,
Nov 13, 2007, 3:22:51 AM11/13/07
to
On Nov 12, 7:49 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "mike3" <mike4...@yahoo.com> wrote in message
<snip>

> > (*) See this post:
> >http://groups.google.com/group/comp.lang.c++/msg/a66278b77597d648
>
> I did some tests on my system and had dismal results also, but in release
> things are a bit better, but not much.
>
> Using your algorithm the array based was 6 seconds, the vector based was 29
> seconds.
>

And I doubt there is any faster algorithm than this to
add numbers, so it would seem the best thing to do here
would be to do away with std::vector. However I've heard
that isn't so good, but then again, sometimes there are
such things as "necessary evils", and it looks like this
might just be one of them. If you read the post that
I linked to, you can see the discussion about the
"abstraction gap" in my previous attempt at an
implementation of the bignum package and the fractal
program, which is why I've been so hesitant to touch a
direct pointer. I'm afraid that if anyone else sees
the new code of the full bignum package, I'll look
like a doofus for refusing to use std::vector in there.

> Using vectors with the array based algorithm
> arradd(&av[0], &av[0], &bv[0], 4);
> is 16 seconds.
>
> I tried all the optmizations I could find but couldn't get an improvement
> over these values.

So then might it be a good idea to, perhaps, write my own
encapsulator for the digit arrays that provides both slow
"safe" element handling _and_ allows one to fetch a pointer
that one can use for fast, "unsafe" handling? And only
use the "unsafe" direct pointer method in the performance-
critical areas of the program, which only makes up a small
portion of the total codebase (four routines: add, sub, mul,
div.) It does not need to be as complicated and general as
std::vector is, it would just need to have load/store, resize,
and direct pointer obtainment. Would this be acceptable,
or would it be a no-no?

Ian Collins

unread,
Nov 13, 2007, 3:49:24 AM11/13/07
to
mike3 wrote:
> On Nov 12, 5:09 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
> wrote:
>> mike3 wrote:
> <snip>
>> It's hard to say given what little you've shown us. We can't even see
>> how you're handling memory allocation. Try posting a complete program
>> (or two) that demonstrates the slowdown you've observed.
>
> Well, here's a complete testing program, then.
>
<snip>

Just for fun (I had to increase the loops to get meaningful data):

Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 15 seconds
Wall time elapsed: 16seconds
Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

--
Ian Collins.

Michael DOUBEZ

unread,
Nov 13, 2007, 4:24:21 AM11/13/07
to
Ian Collins a écrit :

Just for fun with g++ 3.4.1:
--------------------------------------
$ g++ vectorTest.cpp -o vectorTest
$ ./vectorTest
Test 1: 10 million 128-bit additions using C arrays...


Test 1 complete.
CPU time elapsed: 5 seconds
Wall time elapsed: 5 seconds

Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Test 2: 10 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 25 seconds
Wall time elapsed: 26 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781
------------------------------------------
$ g++ vectorTest.cpp -o vectorTest -O3
$ ./vectorTest
Test 1: 10 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 3 seconds
Wall time elapsed: 3 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Test 2: 10 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 3 seconds
Wall time elapsed: 3 seconds
Final sum: 5D495F5B A2C3F794 86D31DED DE4E1781

Michael

Markus Moll

unread,
Nov 13, 2007, 4:24:11 AM11/13/07
to
mike3 wrote:

> On Nov 12, 5:09 pm, Mark P <use...@fall2005REMOVE.fastmailCAPS.fm>
> wrote:
>> mike3 wrote:
> <snip>
>> It's hard to say given what little you've shown us. We can't even see
>> how you're handling memory allocation. Try posting a complete program
>> (or two) that demonstrates the slowdown you've observed.
>
> Well, here's a complete testing program, then.
>

I ran your testing program (with 300 mio. iterations) compiled with GCC. It
gives varying results, but on average it looks like this:


$ ./perf
Test 1: 300 million 128-bit additions using C arrays...
Test 1 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 5seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Test 2: 300 million 128-bit additions using vector...
Test 2 complete.
CPU time elapsed: 4 seconds
Wall time elapsed: 4seconds
Final sum: 4C03923341D2421 60447D54AEE9D13 6027024086C5310 54835F77C23A401

Note that you dereference three extra pointers in your loop. If the compiler
isn't smart enough to optimize those away, it's not a fair comparison

Markus

Ian Collins

unread,
Nov 13, 2007, 5:08:49 AM11/13/07
to
Looks like you built it 64 bit and didn't adjust the values...

--
Ian Collins.

James Kanze

unread,
Nov 13, 2007, 5:40:35 AM11/13/07
to
On Nov 13, 3:13 am, mike3 <mike4...@yahoo.com> wrote:
> On Nov 12, 6:33 pm, "Daniel T." <danie...@earthlink.net> wrote:

> > mike3 <mike4...@yahoo.com> wrote:
> > > The timings show the simple array-based approach takes 4
> > > seconds for 100 million operations on 128 bit numbers on my
> > > machine. The one with std::vector, though, takes a whopping
> > > 40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?

> > Because you did not optimize the build and/or did not remove
> > the debug checks from std::vector. Different compilers do
> > things differently in this regard. Check your documentation.

> Well, I tried it with maximum optimization on, and that
> got the time down to a nice 6 seconds, however the array-
> based routine went down to a smoking-fast 2, so there's still a
> 3x performance gain from array use. I tried turning off the
> debug, but that didn't seem to change it.

I think you mentionned earlier that you use VC++. I seem to
recall hearing that you need to add some special options (/D
something) to turn off all of the debug checking; that just
setting the usual options for an optimized build aren't
sufficient. (No guarantee, but try /D_SECURE_SCL=0.)

> Would it be a good idea then to write my own encapsulation of
> the digit array, with both "safe", slow access modes and
> "unsafe", fast ones (ie. you could request a pointer to the
> digit array contained in the thing and use it like a C-style
> array), then use this for the time-critical areas (the bignum
> package) and std::vector for everything else? Or would that
> just be silly and a waste of work?

It depends. If you want to be 100% of not loosing any cycles,
regardless of the library implementation, then obviously, you'll
have to write your own. Of course, if you want to be 100% sure
that the code is really optimized, you'll have to write your own
compiler as well, or maybe write the critical parts in
assembler. (In assembler, for example, you'll be able to access
the carry bit directly, rather than needing some additional, ad
hoc test.)

Most of the time, of course, it should be sufficient to just
figure out how to get the best performance from your compiler.
Depending on the quality of your library implementation, you may
not be able to get top performance from it, and end up writing
your own, but I doubt that this will be a problem with the VC++
implementation. Finding out how to turn off all of the internal
checking might be, however.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

mathieu

unread,
Nov 13, 2007, 6:02:31 AM11/13/07
to


Hi Mike,

You are not specifying which compiler version are you using. There
is one (c++ .Net edition I think) that is pretty cheap, but won't do
optimization. You need to get the VCFreeTools (you can still find it
on the net). Or try VS2005. std::vector is clearly implemented using a
c-style array, so there should be no difference. Check your C++
compiler documentation.

CMake (http://cmake.org) is using those flags for release build: /
MD /O2 /Ob2 /D NDEBUG


HTH
-Mathieu

James Kanze

unread,
Nov 13, 2007, 6:32:46 AM11/13/07
to
On Nov 13, 3:49 am, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "mike3" <mike4...@yahoo.com> wrote in message

> news:1194910849.3...@z24g2000prh.googlegroups.com...

[...]


> I did some tests on my system and had dismal results also, but
> in release things are a bit better, but not much.

> Using your algorithm the array based was 6 seconds, the vector
> based was 29 seconds.

> Using vectors with the array based algorithm
> arradd(&av[0], &av[0], &bv[0], 4);
> is 16 seconds.

> I tried all the optmizations I could find but couldn't get an
> improvement over these values.

Which compilers, and which options?

I gave the complete version he posted a quick try on the systems
I have at hand (a PC with VC++ 2005 and g++ 3.3.3---the CygWin
port, but I don't think that affects CPU bound programs much---,
an Intel PC with Linux and g++ 4.1.0, and a Sun Sparc with g++
4.1.0 and Sun CC 5.8---Sun CC comes with two different library
implementations). The results are interesting, to say the
least, and do indicate just how much the answer is
implementation dependent. I multiplied the loop count by 10 to
get measurable times. My compiler options were:

VC++: cl /vmg /GR /EHs /D_SECURE_SCL=0 /O2
g++: g++ -O3
Sun CC: CC -O4 and CC -library=stlport4 -O4

The results are interesting, to say the least:

array vector
Windows PC:
VC++ 22 22
g++ 30 38

Linux PC:
g++ 33 42

Sun Sparc:
g++ 73 77
Sun standard 60 172
Sun stlport 60 165

As we can see, g++ doesn't optimize as well as the other
compilers (not too surprising---portability has its price), for
example, and either the library implementations available with
Sun CC are very bad, or Sun CC has problems optimizing "more
advanced" code (or both---the library implementations *are*
very, very bad, however).

It's also interesting to note that the abstraction penalty with
g++ depends on the architecture: vector is 27% slower on a PC,
but only 5% slower on a Sun Sparc.

And VC++ deserves an accolade, both for the quality of its
library and the quality of its compiler. And a criticism: I had
to grunge around in the sources to find the _SECURE_SCL define;
while it didn't take me more than a couple of minutes, that's
because I'm used to this sort of thing---it really should be
documented somewhere. (Maybe it is, but I don't know how to
find it.)

Of course, if you want any real speed for this, you'll rewrite
the inner loop in assembler, where you have direct access to the
carry flag (and an add instruction which takes it into account,
so you don't have to test it anyway).

Daniel T.

unread,
Nov 13, 2007, 7:10:56 AM11/13/07
to
mike3 <mike...@yahoo.com> wrote:
> On Nov 12, 6:33 pm, "Daniel T." <danie...@earthlink.net> wrote:
> > mike3 <mike4...@yahoo.com> wrote:

> > > The timings show the simple array-based approach takes 4
> > > seconds for 100 million operations on 128 bit numbers on my
> > > machine. The one with std::vector, though, takes a whopping
> > > 40-41 seconds -- a 10x performance hit! OUCH!!! Why is that?
> >
> > Because you did not optimize the build and/or did not remove the debug
> > checks from std::vector. Different compilers do things differently in
> > this regard. Check your documentation.
>
> Well, I tried it with maximum optimization on, and that
> got the time down to a nice 6 seconds, however the array-
> based routine went down to a smoking-fast 2, so there's still a
> 3x performance gain from array use. I tried turning off the
> debug, but that didn't seem to change it.

Then you missed something. There is absolutely no reason, based on the
code you showed us, that one routine would be faster than the other.

If you are using Visual C++, then check this out:
(http://msdn2.microsoft.com/en-us/library/aa985965(VS.80).aspx). Maybe
it will help.

> Would it be a good
> idea then to write my own encapsulation of the digit array,
> with both "safe", slow access modes and "unsafe", fast ones
> (ie. you could request a pointer to the digit array contained
> in the thing and use it like a C-style array), then use this for
> the time-critical areas (the bignum package) and std::vector
> for everything else? Or would that just be silly and a waste
> of work?

The latter. :-)

Keith Willis

unread,
Nov 13, 2007, 7:21:20 AM11/13/07
to
On Tue, 13 Nov 2007 21:49:24 +1300, Ian Collins <ian-...@hotmail.com>
wrote:

I know it's all nonsense, but I wanted to play too :-)

Test 1: 100 million 128-bit additions using C arrays...
Test 1 complete.


CPU time elapsed: 4 seconds

Wall time elapsed: 9 seconds
Final sum: 40B88F68 5468071F 3DECEFB8 0675E201

Test 2: 100 million 128-bit additions using vector...
Test 2 complete.


CPU time elapsed: 4 seconds

Wall time elapsed: 9 seconds
Final sum: 40B88F68 5468071F 3DECEFB8 0675E201

This is on a sparc running Solaris 10, with the test compiled with g++
v3.4.5 with -O3.
--
PGP key ID 0xEB7180EC

Erik Wikström

unread,
Nov 13, 2007, 12:20:18 PM11/13/07
to
On 2007-11-13 12:32, James Kanze wrote:
> On Nov 13, 3:49 am, "Jim Langston" <tazmas...@rocketmail.com> wrote:
>> "mike3" <mike4...@yahoo.com> wrote in message
>
>> news:1194910849.3...@z24g2000prh.googlegroups.com...
>
> [...]
>> I did some tests on my system and had dismal results also, but
>> in release things are a bit better, but not much.
>
>> Using your algorithm the array based was 6 seconds, the vector
>> based was 29 seconds.
>
>> Using vectors with the array based algorithm
>> arradd(&av[0], &av[0], &bv[0], 4);
>> is 16 seconds.
>
>> I tried all the optmizations I could find but couldn't get an
>> improvement over these values.
>
> Which compilers, and which options?

[snip benchmark]

> And VC++ deserves an accolade, both for the quality of its
> library and the quality of its compiler. And a criticism: I had
> to grunge around in the sources to find the _SECURE_SCL define;
> while it didn't take me more than a couple of minutes, that's
> because I'm used to this sort of thing---it really should be
> documented somewhere. (Maybe it is, but I don't know how to
> find it.)

It is quite well documented if you know where to look (search for
checked iterators at msdn), the problem is that they do not tell you
about it on the pages describing the containers and iterators.

> Of course, if you want any real speed for this, you'll rewrite
> the inner loop in assembler, where you have direct access to the
> carry flag (and an add instruction which takes it into account,
> so you don't have to test it anyway).

MS would recommend to use intrinsics (SIMD instructions) but of course
by then you have left everything called portability behind. And I have
no idea whether they would be useful for this purpose.

--
Erik Wikström

Bo Persson

unread,
Nov 13, 2007, 2:18:15 PM11/13/07
to
James Kanze wrote:
:

I just have to give another data point, showing that we *always* have
to measure to know for sure.

Again, VC++ 2005, but with an alternate standard library
implementation:

Test 1: 1000 million 128-bit additions using C arrays...
Test 1 complete.

CPU time elapsed: 23 seconds
Wall time elapsed: 24 seconds


Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

Test 2: 1000 million 128-bit additions using vector...
Test 2 complete.

CPU time elapsed: 14 seconds
Wall time elapsed: 14seconds


Final sum: 23106FE7 44D0A28A 64EF219D 9803CB01

For some reason, the compiler here decides that it is proper to fully
inline the vecadd function, saving a call and the parameter passing.
On the other hand, it decides to call arradd() out-of-line while
pushing and popping parameters.


A negative abstraction penalty? :-)


Bo Persson


mike3

unread,
Nov 13, 2007, 5:35:58 PM11/13/07
to
On Nov 13, 3:40 am, James Kanze <james.ka...@gmail.com> wrote:
> On Nov 13, 3:13 am, mike3 <mike4...@yahoo.com> wrote:
<snip>

Well, now I got to to go nice and fast by shutting off the
checks. Thanks for the answers, everyone.

James Kanze

unread,
Nov 14, 2007, 6:22:41 AM11/14/07
to
On Nov 13, 6:20 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
> On 2007-11-13 12:32, James Kanze wrote:
> > On Nov 13, 3:49 am, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> >> "mike3" <mike4...@yahoo.com> wrote in message

> >>news:1194910849.3...@z24g2000prh.googlegroups.com...

> > [...]


> > And VC++ deserves an accolade, both for the quality of its
> > library and the quality of its compiler. And a criticism: I had
> > to grunge around in the sources to find the _SECURE_SCL define;
> > while it didn't take me more than a couple of minutes, that's
> > because I'm used to this sort of thing---it really should be
> > documented somewhere. (Maybe it is, but I don't know how to
> > find it.)

> It is quite well documented if you know where to look (search
> for checked iterators at msdn), the problem is that they do
> not tell you about it on the pages describing the containers
> and iterators.

The problem is that it isn't documented with the rest of the
compiler flags controling optimization. I know the reason, and
I know that Microsoft isn't alone in this; the reason I was able
to find it so quickly perusing the sources is because I've had
to do the same thing so often with other compilers. (Where does
g++ document -D_GLIBCXX_DEBUG, etc., for example.)

Such documentation belongs on the page with the compiler code
generation options. (Even better, of course, would be to add a
real option to the compiler driver, which defines the symbol one
way or another.)

> > Of course, if you want any real speed for this, you'll rewrite
> > the inner loop in assembler, where you have direct access to the
> > carry flag (and an add instruction which takes it into account,
> > so you don't have to test it anyway).

> MS would recommend to use intrinsics (SIMD instructions) but
> of course by then you have left everything called portability
> behind. And I have no idea whether they would be useful for
> this purpose.

Once you use assembler, you've abandonned portability. But some
things (like the handling of carry) can't be easily expressed in
C++, and yet are simple in assembler. (I once saw 2000 lines of
C, used for BCD arithmetic, rewritten in something like 40 lines
of assembler. In this case, the assembler was actually "higher
level" than C, since the target machine had hardware
instructions which did almost exactly what we needed. Of
course, the resulting assembler only worked on that one
machine.)

0 new messages