Vectors vs Arrays performance

Udit Sharma

unread,

Jan 2, 2010, 1:52:45 AM1/2/10

to

Hi,

I am a newbie as far as C++ is concerned. I am just reading books like
Effective C++ and More Effective C++. So kindly bear with me if I have
made some very stupid mistake.

I have read in books that it is always better to use vectors rather
than arrays and that there is no performance hit and also that your
code would be more safe to use when using vectors.

I had to write a small program to compute the sum of digits in a
factorial of a number (<= 10000). I have written two versions: one
using vectors, and one using plain arrays.

But I notice a stark difference in execution times. The program using
arrays runs in about 0.309 seconds (to compute sum of digits in
10000!) while the one using vectors takes about 2.9 seconds to finish
the same. I compiled both on an Ubuntu 9.1 Linux machine running using
gcc 4.4.1 and at optimization level 3.

I have listed both the programs here. Kindly tell me if the problem is
with the way I have written the code or something else.

//Program using arrays
#include <iostream>
int main() {
unsigned int factorial = 0;
std::cout << "Enter number : ";
std::cin >> factorial;

const unsigned MAXDIGITS = 10000, BASE = 10000;
unsigned indDigits[MAXDIGITS]; //This is the array that am
talking about
indDigits[0] = 1;
unsigned int counter = 0, oanda = 0, size = 0;

for(unsigned facts = 1; facts <= factorial; facts++) {
for (unsigned index = counter; index <= size; index++){
oanda += facts * indDigits[index];
indDigits[index] = oanda%BASE;
if(index == counter && !((oanda > 0) && (oanda != BASE)))
counter++;
oanda /= BASE;
}
if(oanda)
indDigits[++size]=oanda;
}

unsigned sumDigits = 0;
for (unsigned i = 0; i <= size; i++){
for(unsigned j = 1; j <= BASE/10; j*=10)
sumDigits += (indDigits[i]/j)%10;
}
std::cout << "Sum of Digits: " << sumDigits << std::endl;
}

//Same Program using vectors
#include <iostream>
#include <vector>
typedef std::vector<unsigned> Vec;
typedef std::vector<unsigned>::iterator Iter;

int main() {
unsigned int factorial = 0;
std::cout << "Enter number : ";
std::cin >> factorial;

const unsigned BASE = 10000;
Vec indDigits; //This is the vector that am talking about
indDigits.push_back(1);
unsigned int counter = 0, oanda = 0, size = 0;

for(unsigned facts = 1; facts <= factorial; facts++) {
oanda = 0;
for(Iter iter = indDigits.begin() + counter; iter !=
indDigits.end(); iter++){
oanda += facts * (*iter);
*iter = oanda%BASE;
if((iter == indDigits.begin() + counter) && !((oanda > 0)
&& (oanda != BASE)))
counter++;
oanda /= BASE;
}
if(oanda)
indDigits.push_back(oanda);
}

unsigned sumDigits = 0;
for (Iter iter = indDigits.begin(); iter != indDigits.end(); iter+
+){
for(unsigned j = 1; j <= BASE/10; j*=10)
sumDigits += ((*iter)/j)%10;
}
std::cout << "Sum of Digits: " << sumDigits << std::endl;
}

Thanks in advance.

Cheers
Udit

Udit Sharma

unread,

Jan 2, 2010, 2:06:57 AM1/2/10

to

Hi,

I noticed that there are some formatting (additional tabs and spaces)
errors above.

You can also check the same code at

http://codepad.org/P6OoPyMO (for arrays version)

and

http://codepad.org/1z7K9wTZ (for vectors version)

I am sorry about the inconvenience.

Cheers
Udit

Eric Böse-Wolf

unread,

Jan 2, 2010, 3:37:09 AM1/2/10

to

Hi,

running the script

http://codepad.org/AfkHj09t

I get the output

http://codepad.org/1HUP1jid

with

g++ (GCC) 4.4.1

Yours sincerely,

Eric

Leclerc

unread,

Jan 2, 2010, 9:33:42 AM1/2/10

to

Hi,

(as you already might have concluded, from Eric's post)

are you sure you are measuring speed of optimized build; note '-O3' in
Eric's script.

The other thing you might do is to reserve in advance memory for vector.
Read documentation about std::vector::reserve, and std::vector::push_back

best regards,
Gordan

Leigh Johnston

unread,

Jan 2, 2010, 9:49:27 AM1/2/10

to

Accessing a vector's contents via operator[] will always be slightly slower
than accessing a local (stack based) array via [] as you have to dereference
a pointer with the vector version. This pointer dereference may by
optimized to be only done once before a loop though. The performance
difference is minor (maybe just a single extra CPU instruction) so would not
cause the difference you are seeing.

Take a look at http://i42.co.uk/stuff/vecarray.htm for a container that has
the same performance as a local (stack based) array.

/Leigh

James Kanze

unread,

Jan 2, 2010, 2:04:03 PM1/2/10

to

On Jan 2, 2:49 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> Accessing a vector's contents via operator[] will always be
> slightly slower than accessing a local (stack based) array via
> [] as you have to dereference a pointer with the vector
> version.

That's bullshit. You have to dereference a pointer in both
cases. In the case of indexing, you have to calculate the
address (both for stack based and for std::vector), and because
these calculations are hidden in some functions with std::vector
(and probably do bounds checking as well), it's harder for the
optimizer to get its hands on them. On the other hand, if
you're using iterators, your code with std::vector is basically
what the optimizer will do with indexing into a built-in array.

So it all depends on the compiler. What will make a difference,
generally, is that when debug information is turned on and the
optimizer is turned off, most compilers will not inline. And
everything you do in std::vector involves a function call; if
that's not inlined, you could very well end up paying in
performance.

--
James Kanze

David Harmon

unread,

Jan 2, 2010, 2:28:15 PM1/2/10

to

On Fri, 1 Jan 2010 22:52:45 -0800 (PST) in comp.lang.c++, Udit Sharma
<sharma...@gmail.com> wrote,

> const unsigned BASE = 10000;
> Vec indDigits; //This is the vector that am talking about
> indDigits.push_back(1);

You are comparing an array that is allocated once and never expanded
with a vector that is expanded and realocated a large number of times.
In other words, you forgot
inDigits.reserve(MAXDIGITS);

Leigh Johnston

unread,

Jan 2, 2010, 2:55:03 PM1/2/10

to

"James Kanze" <james...@gmail.com> wrote in message
news:d215cd97-7eb5-4b38...@k17g2000yqh.googlegroups.com...

> On Jan 2, 2:49 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
>> Accessing a vector's contents via operator[] will always be
>> slightly slower than accessing a local (stack based) array via
>> [] as you have to dereference a pointer with the vector
>> version.
>
> That's bullshit. You have to dereference a pointer in both
> cases. In the case of indexing, you have to calculate the
> address (both for stack based and for std::vector), and because
> these calculations are hidden in some functions with std::vector
> (and probably do bounds checking as well), it's harder for the
> optimizer to get its hands on them. On the other hand, if
> you're using iterators, your code with std::vector is basically
> what the optimizer will do with indexing into a built-in array.
>

It is not bullshit, in the case of the stack array it is simply an offset
from the current stack pointer, no dereference required, take a look at
assembler output if you do not believe me, you should see an extra
instruction for the dereference of vector operator[] compared to the stack
array []. If you are using a pointer/reference to the array then they will
be the same.

int main()
{
volatile int output;
int a[10];
std::vector<int> v(10);
_asm int 3;
output = a[0];
_asm int 3;
output = v[0];
_asm int 3;
}

outputs (VC++ Release Build):

00020 cc int 3
; 19 : output = a[0];
00021 8b 44 24 20 mov eax, DWORD PTR _a$[esp+72]
00025 89 44 24 0c mov DWORD PTR _output$[esp+72], eax
00029 cc int 3
; 21 : output = v[0];
0002a 8b 44 24 14 mov eax, DWORD PTR _v$[esp+76]
0002e 8b 08 mov ecx, DWORD PTR [eax]
00030 89 4c 24 0c mov DWORD PTR _output$[esp+72], ecx
00034 cc int 3

Notice the extra instruction? Don't be so quick to criticize in future...
:)

/Leigh

James Kanze

unread,

Jan 2, 2010, 4:23:05 PM1/2/10

to

On Jan 2, 7:55 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> "James Kanze" <james.ka...@gmail.com> wrote in message

> news:d215cd97-7eb5-4b38...@k17g2000yqh.googlegroups.com...

> > On Jan 2, 2:49 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> >> Accessing a vector's contents via operator[] will always be
> >> slightly slower than accessing a local (stack based) array
> >> via [] as you have to dereference a pointer with the vector
> >> version.

> > That's bullshit. You have to dereference a pointer in both
> > cases. In the case of indexing, you have to calculate the
> > address (both for stack based and for std::vector), and
> > because these calculations are hidden in some functions with
> > std::vector (and probably do bounds checking as well), it's
> > harder for the optimizer to get its hands on them. On the
> > other hand, if you're using iterators, your code with
> > std::vector is basically what the optimizer will do with
> > indexing into a built-in array.

> It is not bullshit, in the case of the stack array it is
> simply an offset from the current stack pointer, no
> dereference required,

You can't access the value without having its address. And the
value is in memory, so accessing it involves dereferencing the
address.

> take a look at assembler output if you do not believe me, you
> should see an extra instruction for the dereference of vector
> operator[] compared to the stack array []. If you are using a
> pointer/reference to the array then they will be the same.

> int main()
> {
> volatile int output;
> int a[10];
> std::vector<int> v(10);
> _asm int 3;
> output = a[0];
> _asm int 3;
> output = v[0];
> _asm int 3;
> }

What on earth is the volatile doing in there? And the _asm?

> outputs (VC++ Release Build):

> 00020 cc int 3
> ; 19 : output = a[0];
> 00021 8b 44 24 20 mov eax, DWORD PTR _a$[esp+72]
> 00025 89 44 24 0c mov DWORD PTR _output$[esp+72], eax
> 00029 cc int 3
> ; 21 : output = v[0];
> 0002a 8b 44 24 14 mov eax, DWORD PTR _v$[esp+76]
> 0002e 8b 08 mov ecx, DWORD PTR [eax]
> 00030 89 4c 24 0c mov DWORD PTR _output$[esp+72], ecx
> 00034 cc int 3

> Notice the extra instruction?

I noticed a lot of extra instructions: a volatile, and some
_asm, to begin with.

If you'd have read the rest of my posting, you'd have seen that
I explained what is actually relevant, not some hypothetical
case which doesn't correspond to anything.

--
James Kanze

Leigh Johnston

unread,

Jan 2, 2010, 4:46:37 PM1/2/10

to

OK, to be absolutely clear when using operator[] std::vector has an *extra*
dereference in addition to the actual deference to get an element's value
compared to an array *on the stack* accessed *locally within the same scope*
(not via a pointer/reference to it). Clear now?

The volatile was there to ensure the compiler did not optimize away
assignments to the same variable (i.e. not relevant to the discussion) and
the asm int3's (breakpoints) were there to delineate the two cases in the
assembler output.

As you can see from the assembler output between the breakpoints the array
[] access involves one less instruction than the vector operator[]. A
std::vector object contains a pointer to the controlled sequence, not the
case for an array on the stack accessed locally from the same scope.

As mentioned in my original reply this extra instruction does not count
significantly to the timing differences observed so other causes have to be
considered (debug instead of release build, optimizations turned off, extra
bounds checking etc.)

Aside: most sane implementations will inline std::vector<T>::operator[].

/Leigh

Balog Pal

unread,

Jan 2, 2010, 5:16:53 PM1/2/10

to

"James Kanze" <james...@gmail.com>

>> > That's bullshit. You have to dereference a pointer in both
>> > cases. In the case of indexing, you have to calculate the
>> > address (both for stack based and for std::vector), and
>> > because these calculations are hidden in some functions with
>> > std::vector (and probably do bounds checking as well), it's
>> > harder for the optimizer to get its hands on them. On the
>> > other hand, if you're using iterators, your code with
>> > std::vector is basically what the optimizer will do with
>> > indexing into a built-in array.
>
>> It is not bullshit, in the case of the stack array it is
>> simply an offset from the current stack pointer, no
>> dereference required,
>
> You can't access the value without having its address. And the
> value is in memory, so accessing it involves dereferencing the
> address.

Come on, the difference must be evident with any stack-using
implementation -- that is the most common. For direct array on the stack
tha base address is already in the stack or frame poitner, and need no
extra load. For the other, the base is at arbitrary locaton that must be
loaded in a register with an extra register.

The assy presented shows exactly that. Whether that single extra load, and
associated new area in cache actually measures in an application is a
different question, the difference may get masked entirely by a stall, but
it may as well show.

>> take a look at assembler output if you do not believe me, you
>> should see an extra instruction for the dereference of vector
>> operator[] compared to the stack array []. If you are using a
>> pointer/reference to the array then they will be the same.
>
>> int main()
>> {
>> volatile int output;
>> int a[10];
>> std::vector<int> v(10);
>> _asm int 3;
>> output = a[0];
>> _asm int 3;
>> output = v[0];
>> _asm int 3;
>> }
>
> What on earth is the volatile doing in there? And the _asm?

Irreelvant to the point, but without such tricks tMSVC would just generate
an empty return for main and remove all the code.

>> Notice the extra instruction?
>
> I noticed a lot of extra instructions: a volatile, and some
> _asm, to begin with.

it's

mov eax, DWORD PTR _a$[esp+72]

vs.

mov ecx, DWORD PTR _v$[esp+76]
mov eax, DWORD PTR [ecx]

If there is an offset calculation, it would add to the last line
respectively, one extra fetch is mandatory.

> If you'd have read the rest of my posting, you'd have seen that
> I explained what is actually relevant, not some hypothetical
> case which doesn't correspond to anything.

Your above-quoted reafoning is simply flawed.
The need to dereference *a* pointer in both cases does not imply that
dereference can be made with the same aount of instructions/code. And it in
not a hypothesis, just the simple fact of life.

TANSTAAFL, extra indirections generally comes with some cost. That we
probably accept for the benefits on the other side, and should not deny.

Udit Sharma

unread,

Jan 2, 2010, 9:33:04 PM1/2/10

to

Hi Leclerc,

I am sorry. I noticed that including the -O3 flag achieved almost the
same results for both the arrays and vector code. I had chosen the
"High Optimization" flag in Code Blocks IDE but somehow that hasn't
been passed to the compiler underneath. Manual compilation like Eric
and you had suggested clearly brings the performance on both the
versions to the same level.

Thanks.

Cheers
Udit

Udit Sharma

unread,

Jan 2, 2010, 10:12:59 PM1/2/10

to

> You are comparing an array that is allocated once and never expanded
> with a vector that is expanded and realocated a large number of times.
> In other words, you forgot
> inDigits.reserve(MAXDIGITS);

David,

Even without the reserve the Vectors version ran as fast as the arrays
version (well almost). I guess then doing this would surely improve
the performane.

But I have one more question. If we know the number of elements that
we are going to store then is it always better to reserve that size
before hand itself?

The reason why I am asking this is because I had read something to the
contrary in Stanley Lipmmann's C++ Primer.

Here is the exact quote (C++ Primer, Page 92, Key Concepts : Vectors
grow dynamically)

"A central property of vectors (and the other library containers) is
that they are required to be implemented so that it is efficient to
add elements to them at run time. Because vectors grow efficiently, it
is usually best to let the vector grow by adding elements to it
dynamically as the element values are known".

I have looked up some online references for more information about
reserve. Here is my understanding. Please correct me if I am wrong.

"When we add elements to the vector the compiler adds an instruction
to push the element into the contiguous block of memory allocated for
the vector. If this memory is not enough then the compiler adds
instructions to allocate a bigger blok of memory (I read that this
increases by a factor of 2) and then moves all the elements from the
previous location to the newer location and then the previously
allocated memory is freed. So in my program since the above mentioned
steps would be done quite a few times, it un-necessarily adds to the
run time and hence my program would be slower".

If what I have said above is true then why does Stanley Lippmann
suggest that it is always better to let vectors grow dynamically?

I hope this question is related to this thread. If it is not let me
know and I will repost it as a new topic.

Cheers
Udit

Miles Bader

unread,

Jan 2, 2010, 11:51:08 PM1/2/10

to

"Leigh Johnston" <le...@i42.co.uk> writes:
> OK, to be absolutely clear when using operator[] std::vector has an
> *extra* dereference in addition to the actual deference to get an
> element's value compared to an array *on the stack* accessed *locally
> within the same scope* (not via a pointer/reference to it). Clear now?

In many uses, this difference is irrelevant though, because the cost of the
first dereference can be amortized over many actual uses (hoisted out of
loops etc), and the "important" code ends up being exactly the same.

E.g., if you use this code (surrounding code omitted):

__asm ("a");
for (unsigned i = 0; i < 10; i++)
a[i] = i;
__asm ("b");
for (unsigned i = 0; i < 10; i++)
v[i] = i;
__asm ("c");

then with gcc 4.4.2 (-O3) you get:

#APP
# 7 "t.cc" 1
a
# 0 "" 2
#NO_APP
movl $0, (%rsp)
movl $1, 4(%rsp)
movl $2, 8(%rsp)
movl $3, 12(%rsp)
movl $4, 16(%rsp)
movl $5, 20(%rsp)
movl $6, 24(%rsp)
movl $7, 28(%rsp)
movl $8, 32(%rsp)
movl $9, 36(%rsp)
#APP
# 10 "t.cc" 1
b
# 0 "" 2
#NO_APP
movl $1, 4(%rax)
movl $2, 8(%rax)
movl $3, 12(%rax)
movl $0, (%rax)
movl $4, 16(%rax)
movl $5, 20(%rax)
movl $6, 24(%rax)
movl $7, 28(%rax)
movl $8, 32(%rax)
movl $9, 36(%rax)
#APP
# 13 "t.cc" 1
c
# 0 "" 2

Obviously compiler dependent, but any decent compiler should do
something similar. That's one reason why C++ vectors are nice -- they
offer some nice additional functionality, but aren't significantly
slower than primitive arrays.

-Miles

--
The secret to creativity is knowing how to hide your sources.
--Albert Einstein

Jan

unread,

Jan 3, 2010, 1:35:45 AM1/3/10

to

On Jan 3, 5:46 am, "Leigh Johnston" <le...@i42.co.uk> wrote:
>
> OK, to be absolutely clear when using operator[] std::vector has an *extra*
> dereference in addition to the actual deference to get an element's value
> compared to an array *on the stack* accessed *locally within the same scope*
> (not via a pointer/reference to it). Clear now?
>

> As you can see from the assembler output between the breakpoints the array
> [] access involves one less instruction than the vector operator[]. A
> std::vector object contains a pointer to the controlled sequence, not the
> case for an array on the stack accessed locally from the same scope.

Can you please explain the above two quotes slightly more clearly to
me? I am not familiar with assembly language programming.

In the examples that have been given above both the vector and the
array have been assigned in local scope. So why is there an additional
instruction overhead to access the elements in the vector but not so
in the array?

Is it because a vector is an whole object by itself (different from
the elements contained within it) and that the contiguous block of
memory within it is just a sub part of it and that an additional
instruction is needed to get to this sub part?

Cheers
Udit

Bo Persson

unread,

Jan 3, 2010, 4:54:38 AM1/3/10

to

When someone says "always", he really means "almost always". :-)

But you are right, unless the pushing is the central part of your
program you will not notice the difference. So don't bother reserving
unless you can see a good reason for it.

Bo Persson

Carlo Milanesi

unread,

Jan 3, 2010, 6:03:37 AM1/3/10

to

Udit Sharma wrote:

>> You are comparing an array that is allocated once and never expanded
>> with a vector that is expanded and realocated a large number of times.
>> In other words, you forgot
>> inDigits.reserve(MAXDIGITS);
>
> David,
>
> Even without the reserve the Vectors version ran as fast as the arrays
> version (well almost). I guess then doing this would surely improve
> the performane.
>

> If we know the number of elements that

> we are going to store then is it always better to reserve that size
> before hand itself?

I think so.

> The reason why I am asking this is because I had read something to the
> contrary in Stanley Lipmmann's C++ Primer.
>
> Here is the exact quote (C++ Primer, Page 92, Key Concepts : Vectors
> grow dynamically)
>
> "A central property of vectors (and the other library containers) is
> that they are required to be implemented so that it is efficient to
> add elements to them at run time. Because vectors grow efficiently, it
> is usually best to let the vector grow by adding elements to it
> dynamically as the element values are known".
>

> If what I have said above is true then why does Stanley Lippmann
> suggest that it is always better to let vectors grow dynamically?

Lippmann wrote: "it is usually best" not "it is always better".
But I think he meant that it is usually wrong to set the "size" of the
vector bigger than needed. In that sentence, he said nothing about the
"capacity" of the vector. The function "reserve" changes the capacity,
not the size.
Nevertheless, setting the size of a vector beforehand may increase
performance in some cases. Yours is one of them.

> If this memory is not enough then the compiler adds
> instructions to allocate a bigger blok of memory

The compiler adds all the needed instructions at compile time. If the
vector capacity is not enough, the instructions to allocate a bigger
block of memory are actually executed.

> (I read that this increases by a factor of 2)

Not necessarily. For example, it may be multiplied by 7 / 4 and then
rounded upwardly to the nearest power of two.

--

Carlo Milanesi
http://digilander.libero.it/carlmila

James Kanze

unread,

Jan 3, 2010, 7:10:50 AM1/3/10

to

On Jan 2, 9:46 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> OK, to be absolutely clear when using operator[] std::vector
> has an *extra* dereference in addition to the actual deference
> to get an element's value compared to an array *on the stack*
> accessed *locally within the same scope* (not via a
> pointer/reference to it). Clear now?

It's clear that you're claiming something that is in general
false.

Consider first the simple case:

int
main()
{
int a1[10];
for ( int i = 0 ; i < 10; ++ i ) {
a1[ i ] = i;
}
int* a2 = new int[ 10 ];
for ( int i = 0 ; i < 10; ++ i ) {
a2[ i ] = i;
}
// additional code which uses the arrays, to ensure
// that the compiler doesn't optimize them out.
}

Any decent compiler should generate exactly the same code for both
loops.

The second loop, of course, basically corresponds to what goes
on under the hood in std::vector, at least with regards to the
indexing. The main difference is that all of the operations in
std::vector are, in fact, function calls. Which, depending on
the compiler, may make recognizing the pattern more difficult.

[...]

> As you can see from the assembler output between the
> breakpoints the array [] access involves one less instruction
> than the vector operator[].

In one particular case, on one particular machine, with one
particular compiler. It's certainly not a general case. And
it's not a realistic case because of the volatile and the inline
assembler (both of which affect compiler optimization
significantly).

> A std::vector object contains a pointer to the controlled
> sequence, not the case for an array on the stack accessed
> locally from the same scope.

Both require the compiler to access through a pointer. Both
require some sort of pointer arithmetic. Beyond that, anything
you can say depends on the compiler and the machine hardware.
And the environment in which the array access occurs, which
affects how the compiler optimizes.

> As mentioned in my original reply this extra instruction does
> not count significantly to the timing differences observed so
> other causes have to be considered (debug instead of release
> build, optimizations turned off, extra bounds checking etc.)

Exactly. Other considerations which generally mean that there
won't be an extra instruction.

> Aside: most sane implementations will inline std::vector<T>::operator[].

I think all do:-). On the other hand, depending on optimization
levels and other compiler options, the compiler may or may not
respect the request to inline. (Without optimization, I don't
think g++ ever inlines. But I could be wrong about that.)

--
James Kanze

James Kanze

unread,

Jan 3, 2010, 7:31:58 AM1/3/10

to

On Jan 2, 10:16 pm, "Balog Pal" <p...@lib.hu> wrote:
> "James Kanze" <james.ka...@gmail.com>

> >> > That's bullshit. You have to dereference a pointer in
> >> > both cases. In the case of indexing, you have to
> >> > calculate the address (both for stack based and for
> >> > std::vector), and because these calculations are hidden
> >> > in some functions with std::vector (and probably do
> >> > bounds checking as well), it's harder for the optimizer
> >> > to get its hands on them. On the other hand, if you're
> >> > using iterators, your code with std::vector is basically
> >> > what the optimizer will do with indexing into a built-in
> >> > array.

> >> It is not bullshit, in the case of the stack array it is
> >> simply an offset from the current stack pointer, no
> >> dereference required,

> > You can't access the value without having its address. And
> > the value is in memory, so accessing it involves
> > dereferencing the address.

> Come on, the difference must be evident with any stack-using
> implementation -- that is the most common.

Having actually implemented compilers for stack based
implementations, it's quite evident to me that the difference
depends, and that the extra dereference should actually be quite
rare.

> For direct array on the stack the base address is already in

> the stack or frame poitner, and need no extra load. For the
> other, the base is at arbitrary locaton that must be loaded in
> a register with an extra register.

Most of the time, on most hardware, the base pointer for the
array will also already be in a register. Unless you've got a
very bad compiler. (It's a little more difficult for Intel
architecture, because of the paucity of registers. But
Microsoft's C 1.0 managed most of the times, so it's hardly
leading edge, even on Intel.)

> The assy presented shows exactly that.

The assembly shows that when you surround a single access with
inline assembler and accesses to a volatile, the compiler
doesn't optimize very well. Something I was aware of before
even looking at the assembler.

> Whether that single extra load, and associated new area in
> cache actually measures in an application is a different
> question, the difference may get masked entirely by a stall,
> but it may as well show.

It won't show, because it won't be present in any critical
paths.

Why does everyone suppose that compilers are so stupid? This
sort of optimization has been present in most compilers for well
over 20 years now.

> >> take a look at assembler output if you do not believe me, you
> >> should see an extra instruction for the dereference of vector
> >> operator[] compared to the stack array []. If you are using a
> >> pointer/reference to the array then they will be the same.

> >> int main()
> >> {
> >> volatile int output;
> >> int a[10];
> >> std::vector<int> v(10);
> >> _asm int 3;
> >> output = a[0];
> >> _asm int 3;
> >> output = v[0];
> >> _asm int 3;
> >> }

> > What on earth is the volatile doing in there? And the _asm?

> Irreelvant to the point,

Very relevant to the point, because they create an environment
in which the compiler won't optimize.

> but without such tricks tMSVC would just generate
> an empty return for main and remove all the code.

Probably. But ensuring that no optimization is possible isn't
very realistic either, since in real code, optimization is
possible.

Note too that the old rule: "never trust a benchmark you didn't
falsify yourself" holds here as well. If I wanted to "prove"
that an extra instruction was required, I know how to write a
"realistic" benchmark that would do it. Except that 1) it would
really only prove the point for a given compiler on a given
architecture, and 2) "realistic" is in the eyes of the beholder:
while volatile and inline assembler certainly aren't realistic,
who knows where you really have to draw the line. (FWIW: my
case would use a virtual function returning a[i] from a private
member array. In my experience, virtual functions are really
great optimization blockers for a lot of compilers---including
VC++. But is it realistic to have a virtual function which just
makes one access to a single element? That probably depends on
the application domain, but I've never seen it; whenever I'm
using an array, I'm accessing a large number of elements
locally, which means that the compiler will have hoisted the
pointer to the array data into a register. Regardless of where
that pointer comes from.)

> >> Notice the extra instruction?

> > I noticed a lot of extra instructions: a volatile, and some
> > _asm, to begin with.

> it's

> mov eax, DWORD PTR _a$[esp+72]

> vs.

> mov ecx, DWORD PTR _v$[esp+76]
> mov eax, DWORD PTR [ecx]

Yes, but that code wouldn't necessarily be generated if there
wasn't a volatile nor any inline assembler.

> If there is an offset calculation, it would add to the last line
> respectively, one extra fetch is mandatory.

One extra fetch is mandatory IF your code is twisted enough to
inhibit all compiler optimizations. In practice, the real code
I've seen isn't (and the architecture hasn't been Intel, which
changes things as well), and pointers to the array have always
been in a register.

[...]

> TANSTAAFL, extra indirections generally comes with some cost.
> That we probably accept for the benefits on the other side,
> and should not deny.

Except that the extra indirection isn't alway, or even usually,
present. If you pass the array to a function, and access it
there, for example, there is no difference between an array on
the stack and an array on the heap. (There may be a difference
between a C style array and std::vector, because many compilers
systematically pass class type objects in memory, but pointers
in a register.) If you use the array several times in the same
function, the compiler will also hoist the address calculations
up to before the first use---you don't get an extra access for
each use. And so on. The only way you can compare is to
implement both versions, with real code, and time them.

--
James Kanze

James Kanze

unread,

Jan 3, 2010, 7:40:56 AM1/3/10

to

On Jan 3, 3:12 am, Udit Sharma <sharmaa.u...@gmail.com> wrote:

[...]

> But I have one more question. If we know the number of
> elements that we are going to store then is it always better
> to reserve that size before hand itself?

It's generally not worth the effort, but if performance becomes
an issue, it's one of the simplest first steps to try.

> The reason why I am asking this is because I had read
> something to the contrary in Stanley Lipmmann's C++ Primer.

> Here is the exact quote (C++ Primer, Page 92, Key Concepts :
> Vectors grow dynamically)

> "A central property of vectors (and the other library
> containers) is that they are required to be implemented so
> that it is efficient to add elements to them at run time.
> Because vectors grow efficiently, it is usually best to let
> the vector grow by adding elements to it dynamically as the
> element values are known".

Certainly, It's best not to worry about it (since the
performance should be "reasonable") until the optmizer tells you
otherwise.

Note too that reserve does *not* grow the vector. It simply
ensures that the memory is pre-allocated so that growing the
vector won't require reallocation. (And most of the time I use
it, it is to guarantee lifetime of iterators, which may be
invalidated by reallocation, rather than for performance
reasons.)

> I have looked up some online references for more information
> about reserve. Here is my understanding. Please correct me if
> I am wrong.

> "When we add elements to the vector the compiler adds an
> instruction to push the element into the contiguous block of
> memory allocated for the vector. If this memory is not enough
> then the compiler adds instructions to allocate a bigger blok
> of memory (I read that this increases by a factor of 2) and
> then moves all the elements from the previous location to the
> newer location and then the previously allocated memory is
> freed. So in my program since the above mentioned steps would
> be done quite a few times, it un-necessarily adds to the run
> time and hence my program would be slower".

Well, to start with, the compiler doesn't add any instructions;
they're all there to begin with:-). But the code does execute
more or less instructions.

> If what I have said above is true then why does Stanley
> Lippmann suggest that it is always better to let vectors grow
> dynamically?

Because the size increase is exponential, which means that over
a long period of time, push_back has amortized constant time,
and isn't expensive enough to worry about. (Until the profiler
tells you otherwise.)

Once the profiler indicates that you have a problem here, then
reserve is the simplest solution. For objects which are cheap
to copy, another alternative is resize, then use [] rather than
push_back. Logically, this would seem slower, but at least in
the one trial I did (i.e. for that one machine, with that
particular compiler and implementation of the library), resize
and [] were faster, despite actually initializing each cell
twice.

--
James Kanze

Leigh Johnston

unread,

Jan 3, 2010, 7:51:21 AM1/3/10

to

In my original reply I stated "This pointer dereference may by

Leigh Johnston

unread,

Jan 3, 2010, 8:08:25 AM1/3/10

to

We can come up with examples and counter-examples all day long.

Consider this:

int main()
{
std::vector<int> v;
v.reserve(1000);
lib::vecarray<int, 1000> va;
{
timer t("vector: ");
for (int i = 0; i != 1000000; ++i)
{
for (int j = 0; j != 1000; ++j)
v.push_back(j);
v.clear();
}
}
{
timer t("vecarray: ");
for (int i = 0; i != 1000000; ++i)
{
for (int j = 0; j != 1000; ++j)
va.push_back(j);
va.clear();
}
}
}

Which outputs (several runs done, VC++ release build, secure STL disabled):

vector: 2.9741 seconds
vecarray: 2.6782 seconds

Which indicates (for this example) that a stack based container
(http://i42.co.uk/stuff/vecarray.htm) is 11% quicker than std::vector.

1) std::vector contains a pointer to its elements, a stack based container
(or an ordinary array on the stack) does not
2) you may benefit from the locality of your container and other local
variables (think CPU cache)

Have you seen the film "The Matrix"? "There is no spoon." "There is no
pointer.".

/Leigh

Leigh Johnston

unread,

Jan 3, 2010, 8:11:38 AM1/3/10

to

In my original reply I stated "This pointer dereference may by

Leigh Johnston

unread,

Jan 3, 2010, 8:14:18 AM1/3/10

to

"Jan" <janani...@gmail.com> wrote in message
news:f470e432-53ff-4bf1...@c3g2000yqd.googlegroups.com...

>
> Is it because a vector is an whole object by itself (different from
> the elements contained within it) and that the contiguous block of
> memory within it is just a sub part of it and that an additional
> instruction is needed to get to this sub part?
>

Correct the std::vector object contains a pointer to its elements, the array
on the stack does not. Your original problem though was probably due to you
not calling std::vector<T>::reserve(...).

/Leigh

Leigh Johnston

unread,

Jan 3, 2010, 8:45:30 AM1/3/10

to

If I increase the container size to 10000 elements the difference is more
pronounced:

vector: 33.0910 seconds
vecarray: 26.9410 seconds

The stack based container is now 23% faster than std::vector.

/Leigh

Leigh Johnston

unread,

Jan 3, 2010, 8:56:02 AM1/3/10

to

>
> In one particular case, on one particular machine, with one
> particular compiler. It's certainly not a general case. And
> it's not a realistic case because of the volatile and the inline
> assembler (both of which affect compiler optimization
> significantly).
>

volatile is not relevant to this discussion, it was simply used to
illustrate that *at least one* extra pointer dereference is required even if
this is subsequently cached in a register. VC++ would optimize both
assignments completely away without it.

You *cannot* avoid *at least one* extra pointer dereference when using
std::vector compared to a stack based container or array.

BTW I am not eschewing the use of std::vector due to an optimization gain
that varies wildly depending on use-case, I hardly use stack based arrays
now, I use std::vector a lot or my own stack based container.

/Leigh

Balog Pal

unread,

Jan 3, 2010, 11:41:35 AM1/3/10

to

"James Kanze" <james...@gmail.com>

>> Come on, the difference must be evident with any stack-using
>> implementation -- that is the most common.
>
> Having actually implemented compilers for stack based
> implementations, it's quite evident to me that the difference
> depends, and that the extra dereference should actually be quite
> rare.

Well, this far you did not say anything that would explain the last part --
how you reach the "quite rare" figure.

The situation is like having points A anc C, case one: go stratight line
A->C, case two: tough point B first. B amy happen to be on the line, or
moved there by some tricks -- or the whole trip may be ignored in the full
travel context, but the difference still is one way, and its being zero
depends on a bunch of 'chance' factors.

And those factors are the kind IMO you can't tell.

>> For direct array on the stack the base address is already in
>> the stack or frame poitner, and need no extra load. For the
>> other, the base is at arbitrary locaton that must be loaded in
>> a register with an extra register.
>
> Most of the time, on most hardware, the base pointer for the
> array will also already be in a register. Unless you've got a
> very bad compiler. (It's a little more difficult for Intel
> architecture, because of the paucity of registers. But
> Microsoft's C 1.0 managed most of the times, so it's hardly
> leading edge, even on Intel.)

To actually have that, the compiler must:
- inline vector's constructor or last resize() operation that sets the data
address
- the arbitrary code that follows until the next index op shall not use all
the available data registers
- if the code has any of vector's operations those also must all be inlined,
or the compiler use special knowledge that they do not change the state

To me it looks like too many assumes tied to arbitrary/unknown code.
Your example in another post just keeps that code portion as nothing -- I
agree that for that case we can expect no extra load. But that case is too
tweaked to lay a 'proof'.

>> The assy presented shows exactly that.
>
> The assembly shows that when you surround a single access with
> inline assembler and accesses to a volatile, the compiler
> doesn't optimize very well. Something I was aware of before
> even looking at the assembler.

In the presented case the compiler did a good job, and I'm sure it would
produce the same code for many different layouts using other tricks against
removal.

>> Whether that single extra load, and associated new area in
>> cache actually measures in an application is a different
>> question, the difference may get masked entirely by a stall,
>> but it may as well show.
>
> It won't show, because it won't be present in any critical
> paths.

How can you know critical paths of unknown/arbitrary code?

> Why does everyone suppose that compilers are so stupid? This
> sort of optimization has been present in most compilers for well
> over 20 years now.

Why you assume we think that? In many post here and around I'm the one
usually pointing out that healthy optimizer shall treat different-looking
expression the same.

The (good) optimizer can discover many things, and remove redundant stuff --
but is not magic. The vector case uses extra resource, and depends on much.
Also, the 'save' amount is about the lowest possible, a single pointer
fetch. A smart optimizer's first pick to drop if there is any doubt.

>> but without such tricks tMSVC would just generate
>> an empty return for main and remove all the code.
>
> Probably. But ensuring that no optimization is possible isn't
> very realistic either, since in real code, optimization is
> possible.

Yeah -- subject to (at least) conditions mentioned above.

> Note too that the old rule: "never trust a benchmark you didn't
> falsify yourself" holds here as well. If I wanted to "prove"
> that an extra instruction was required, I know how to write a
> "realistic" benchmark that would do it.

Thought we are discussing an 'equivelence' issue, and be serioeus about it,
not a contest to cheat each other ot the public.

> Except that 1) it would
> really only prove the point for a given compiler on a given
> architecture, and 2) "realistic" is in the eyes of the beholder:
> while volatile and inline assembler certainly aren't realistic,
> who knows where you really have to draw the line.

In general, possibly, for this case those were irrelevant and did not mess
with the relevant content.

> (FWIW: my
> case would use a virtual function returning a[i] from a private
> member array. In my experience, virtual functions are really
> great optimization blockers for a lot of compilers---including
> VC++.

Huh? Of course they are -- there is a call the compiler can't follow, and
the called function has license to change anything, including a private
member vector's state. For most practical cases even if you use const like
crazy.

And there are many similar 'blockers'. That is why I don't get your claim
for 'rare', my educated guess would calculate with at least one extra fetch
somewhere -- and call its omissin a case of good luck.

> But is it realistic to have a virtual function which just
> makes one access to a single element?

Err, I think I lost you, as I got it originally the function may do aything,
and it is arbitrary. And having vector members and virtual functions is
hardly unrealistic. (Even with my position that in general nonvirtyual
functions outnumber virtuals by faaaar. ;)

> That probably depends on
> the application domain, but I've never seen it; whenever I'm
> using an array, I'm accessing a large number of elements
> locally, which means that the compiler will have hoisted the
> pointer to the array data into a register. Regardless of where
> that pointer comes from.)

IMO we all agreed that it *can* be hoisted. Also, that even if it isn't the
overall speed difference is likely amortized. And that using vector has too
many advantages to even think of old-style arrays without a very good and
profiled reason.

Yet, denying away the difference is IMO not fair -- the users are better
aware of the difference.

>> If there is an offset calculation, it would add to the last line
>> respectively, one extra fetch is mandatory.
>
> One extra fetch is mandatory IF your code is twisted enough to
> inhibit all compiler optimizations. In practice, the real code
> I've seen isn't (and the architecture hasn't been Intel, which
> changes things as well), and pointers to the array have always
> been in a register.
>
> [...]
>> TANSTAAFL, extra indirections generally comes with some cost.
>> That we probably accept for the benefits on the other side,
>> and should not deny.
>
> Except that the extra indirection isn't alway, or even usually,
> present. If you pass the array to a function, and access it
> there, for example, there is no difference between an array on
> the stack and an array on the heap.

If you pass it the same way, that is... Then the difference is restricted to
the call site (and is exactly as discussed this far). If you pass the
vector by reference, then it remains. If you pass iterators, then you depend
on their implementation, some use plain pointers leading to identical code,
others may use fatties.

> (There may be a difference
> between a C style array and std::vector, because many compilers
> systematically pass class type objects in memory, but pointers
> in a register.) If you use the array several times in the same
> function, the compiler will also hoist the address calculations
> up to before the first use---you don't get an extra access for
> each use. And so on.

Yeah, sure. ;) Stated *that* way there is nothing to argue.

Claiming the difference as 'bullshit' is different.

Udit Sharma

unread,

Jan 3, 2010, 11:39:25 AM1/3/10

to

James and Leigh,

Thanks a lot for your clarifications. It helped me understand things
better.

James,

When I said compiler adds instructions, I actually meant that it
places it somewhere in the executable for it to be used when a bigger
block of memory needs to be created :-D. Am sorry that what I said
earlier was a bit misleading about compilers adding instructions at
run time. Pardon my English.

Cheers
Udit

Leigh Johnston

unread,

Jan 3, 2010, 12:12:50 PM1/3/10

to

>>> The assy presented shows exactly that.
>>
>> The assembly shows that when you surround a single access with
>> inline assembler and accesses to a volatile, the compiler
>> doesn't optimize very well. Something I was aware of before
>> even looking at the assembler.
>
> In the presented case the compiler did a good job, and I'm sure it would
> produce the same code for many different layouts using other tricks
> against removal.
>

A version without any "scary" volatile or inline assembler:

void foo(int);

int main()
{
void (*fp)(int) throw() = &foo;

int a[10];
std::vector<int> v(10);

fp(0);
fp(a[0]);
fp(v[0]);
}

produces:

; 15 : fp(a[0]);

0004f 8b 44 24 28 mov eax, DWORD PTR _a$[esp+96]
00053 83 c4 04 add esp, 4
00056 50 push eax
00057 e8 00 00 00 00 call ?foo@@YAXH@Z ; foo

; 16 : fp(v[0]);

0005c 8b 44 24 1c mov eax, DWORD PTR _v$[esp+100]
00060 8b 08 mov ecx, DWORD PTR [eax]
00062 83 c4 04 add esp, 4
00065 51 push ecx
00066 e8 00 00 00 00 call ?foo@@YAXH@Z ; foo
0006b 83 c4 04 add esp, 4

Again the pertinent bits are:

mov eax, DWORD PTR _a$[esp+96]

vs.

mov eax, DWORD PTR _v$[esp+100]

mov ecx, DWORD PTR [eax]

One extra instruction for the std::vector version. And for the avoidance of
doubt I repeat again that I realize that the register can be re-used in a
subsequent loop a fact I stated in my original post if you care to read it
properly Mr James Kanze.

Nick Keighley

unread,

Jan 4, 2010, 8:18:56 AM1/4/10

to

On 2 Jan, 19:04, James Kanze <james.ka...@gmail.com> wrote:
> On Jan 2, 2:49 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:

> > Accessing a vector's contents via operator[] will always be
> > slightly slower than accessing a local (stack based) array via
> > [] as you have to dereference a pointer with the vector
> > version.
>
> That's bullshit. You have to dereference a pointer in both
> cases. In the case of indexing, you have to calculate the
> address (both for stack based and for std::vector), and because
> these calculations are hidden in some functions with std::vector
> (and probably do bounds checking as well),

I thought operator[] didn't do bounds checking (or not usually) whilst
at() did.

James Kanze

unread,

Jan 4, 2010, 5:16:50 PM1/4/10

to

On Jan 3, 1:08 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> We can come up with examples and counter-examples all day long.

Exactly. That's what I've been saying from the beginning. You
can't say absolutely that there is an extra access. There might
be. There might not be. It all depends on the actual use, the
compiler and the underlying architecture.

> Consider this:

> int main()
> {
> std::vector<int> v;
> v.reserve(1000);
> lib::vecarray<int, 1000> va;
> {
> timer t("vector: ");
> for (int i = 0; i != 1000000; ++i)
> {
> for (int j = 0; j != 1000; ++j)
> v.push_back(j);
> v.clear();
> }
> }
> {
> timer t("vecarray: ");
> for (int i = 0; i != 1000000; ++i)
> {
> for (int j = 0; j != 1000; ++j)
> va.push_back(j);
> va.clear();
> }
> }
> }

> Which outputs (several runs done, VC++ release build, secure
> STL disabled):

> vector: 2.9741 seconds
> vecarray: 2.6782 seconds

> Which indicates (for this example) that a stack based container
> (http://i42.co.uk/stuff/vecarray.htm) is 11% quicker than std::vector.

For this example, with this compiler, on this machine. In
practice, I've had similar results with Sun CC on a Sparc, so
your measurements aren't unique. But in this case, you're
measuring the time necessary to grow the array, which is not the
same thing as the time necessary for an access.

If you'll remember, your original statement was: "Accessing a

vector's contents via operator[] will always be slightly slower
than accessing a local (stack based) array via [] as you have to

dereference a pointer with the vector version." That's the only
statement I'm disagreeing with, and in that statement, what I'm
really disagreeing with is the *always*. Because in most cases,
I think that the "dereference a pointer" simply won't be present
in the individual accesses. Or it will be present with the
"stack based array", because you'll have passed it to a
function. Your statement simply doesn't correspond to the
reality in actual programs.

--
James Kanze

James Kanze

unread,

Jan 4, 2010, 5:19:51 PM1/4/10

to

On Jan 3, 1:56 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> > In one particular case, on one particular machine, with one
> > particular compiler. It's certainly not a general case. And
> > it's not a realistic case because of the volatile and the inline
> > assembler (both of which affect compiler optimization
> > significantly).

> volatile is not relevant to this discussion, it was simply
> used to illustrate that *at least one* extra pointer
> dereference is required even if this is subsequently cached in
> a register. VC++ would optimize both assignments completely
> away without it.

Volatile is very relevant, because anything which affects
optimization is very relevant when you start speaking of
generated code.

> You *cannot* avoid *at least one* extra pointer dereference
> when using std::vector compared to a stack based container or
> array.

Never the less, most compilers do, most of the time.

> BTW I am not eschewing the use of std::vector due to an
> optimization gain that varies wildly depending on use-case, I
> hardly use stack based arrays now, I use std::vector a lot or
> my own stack based container.

Certainly. I understand that. I still think your basic precept
is wrong. Compilers are not at all as dumb as you seem to
think.

--
James Kanze

Leigh Johnston

unread,

Jan 4, 2010, 5:30:46 PM1/4/10

to

>
> For this example, with this compiler, on this machine. In
> practice, I've had similar results with Sun CC on a Sparc, so
> your measurements aren't unique. But in this case, you're
> measuring the time necessary to grow the array, which is not the
> same thing as the time necessary for an access.
>
> If you'll remember, your original statement was: "Accessing a
> vector's contents via operator[] will always be slightly slower
> than accessing a local (stack based) array via [] as you have to
> dereference a pointer with the vector version." That's the only
> statement I'm disagreeing with, and in that statement, what I'm
> really disagreeing with is the *always*. Because in most cases,
> I think that the "dereference a pointer" simply won't be present
> in the individual accesses. Or it will be present with the
> "stack based array", because you'll have passed it to a
> function. Your statement simply doesn't correspond to the
> reality in actual programs.
>

This discussion is actually about general vector vs array performance. I
think I have shown that there *are* performance benefits to a array on the
stack or other stack based container vs std::vector. The locality of doing
things on the stack compared to jumping to the free store are obvious.
std::vector has a pointer to its elements, stack based container does not.
I have shown this to be true in an actual program in actual reality. I did
not make up those benchmarks.

However I feel you are too blinkered for this discussion to continue any
further in a meaningful way. "There is no spoon" "There is not pointer".

/Leigh

James Kanze

unread,

Jan 4, 2010, 5:49:48 PM1/4/10

to

On Jan 3, 4:41 pm, "Balog Pal" <p...@lib.hu> wrote:
> "James Kanze" <james.ka...@gmail.com>

> >> Come on, the difference must be evident with any stack-using

> >> implementation -- that is the most common.

> > Having actually implemented compilers for stack based
> > implementations, it's quite evident to me that the difference
> > depends, and that the extra dereference should actually be quite
> > rare.

> Well, this far you did not say anything that would explain the
> last part -- how you reach the "quite rare" figure.

It's based on two things: a knowledge of how compilers (and in
particular, their optimizers) work, and the actual uses I make
of arrays in my own code. Obviously, the second is very
personal, but really, how often do you use have an array and
only access a single element?

> The situation is like having points A anc C, case one: go stratight line
> A->C, case two: tough point B first. B amy happen to be on the line, or
> moved there by some tricks -- or the whole trip may be ignored in the full
> travel context, but the difference still is one way, and its being zero
> depends on a bunch of 'chance' factors.

> And those factors are the kind IMO you can't tell.

Nope. You can't tell. That's exactly my point. Until you
actually measure, you can't really tell which solution will be
fastest.

> >> For direct array on the stack the base address is already
> >> in the stack or frame poitner, and need no extra load. For
> >> the other, the base is at arbitrary locaton that must be
> >> loaded in a register with an extra register.

> > Most of the time, on most hardware, the base pointer for the
> > array will also already be in a register. Unless you've got
> > a very bad compiler. (It's a little more difficult for
> > Intel architecture, because of the paucity of registers.
> > But Microsoft's C 1.0 managed most of the times, so it's
> > hardly leading edge, even on Intel.)

> To actually have that, the compiler must:
> - inline vector's constructor or last resize() operation that sets the data
> address

Not really. What is required is that all of the functions in a
given sequence be inlined, so that the compiler can tell that
the pointer doesn't change.

I've already admitted that the fact that the operators are
functions, and not built-in operators, can have a negative
effect on performance. That's a different issue (sometimes
referred to as the abstraction penalty).

> - the arbitrary code that follows until the next index op shall not use all
> the available data registers

Agreed. That's usually the case, however, or else why are you
using an array. (That's also the reason why I said that the
results depend on the architecture. Most of my work until
recently has been on Sparcs, with 32 registers, and of course,
optimizers have a much easier job there than on an Intel.)

> - if the code has any of vector's operations those also must
> all be inlined, or the compiler use special knowledge that
> they do not change the state

Agreed, sort of. Inlining is actually irrelevant (except that
any real function calls will slow things down); what's important
is that the compiler can "see" the code in all of the functions
involved. Since std::vector is a template, this is guaranteed
unless the compiler implements export, and the standard library
uses it. Which as far as I know doesn't leave many compilers
out of the running.

> To me it looks like too many assumes tied to arbitrary/unknown
> code.

Excuse me, but I'm not assuming anything. I'm arguing against a
statement which said "Accessing a vector's contents via
operator[] will always be slightly slower[...]". All I have to
do to prove my point is show one counter example.

[...]

> > It won't show, because it won't be present in any critical
> > paths.

> How can you know critical paths of unknown/arbitrary code?

If it's a critical path, it will be in a tight loop; otherwise,
any extra access won't make a measurable difference. And
compilers do know how to optimize tight loops.

> > Why does everyone suppose that compilers are so stupid? This
> > sort of optimization has been present in most compilers for well
> > over 20 years now.

> Why you assume we think that? In many post here and around I'm
> the one usually pointing out that healthy optimizer shall
> treat different-looking expression the same.

I know. That's why your reaction here surprises me.

[...]

> > Note too that the old rule: "never trust a benchmark you
> > didn't falsify yourself" holds here as well. If I wanted to
> > "prove" that an extra instruction was required, I know how
> > to write a "realistic" benchmark that would do it.

> Thought we are discussing an 'equivelence' issue, and be
> serioeus about it, not a contest to cheat each other ot the
> public.

I'm not sure about the equivalence issue---what is supposed to
be equivalent to what? But the fact remains that the one
example posted counts as a "falsified" benchmark, in the sense
that it looks at a very atypical example. A more typical
example would involve something like:

for ( int i = 0; i != v.size(); ++ i )
// something with v[i]

Or perhaps a binary search (so the compiler couldn't trivially
remap the index manipulations into pointer arithmetic).

> > Except that 1) it would really only prove the point for a
> > given compiler on a given architecture, and 2) "realistic"
> > is in the eyes of the beholder: while volatile and inline
> > assembler certainly aren't realistic, who knows where you
> > really have to draw the line.

> In general, possibly, for this case those were irrelevant and
> did not mess with the relevant content.

> > (FWIW: my case would use a virtual function returning a[i]
> > from a private member array. In my experience, virtual
> > functions are really great optimization blockers for a lot
> > of compilers---including VC++.

> Huh? Of course they are -- there is a call the compiler can't
> follow,

All compilers can in a few situations, and some can in almost
all situtations. But most can't most of the time:-). And I've
designed my bench harness to make it particularly difficult for
the compiler to follow, even for those which can do so in most
cases. (I ensure, for example, that there are a similar number
of calls which resolve to different virtual functions in the
same loop. Each time I enter the loop, however, its the same
virtual function called each time in the loop, and a compiler
could potentially detect that, and given that there are only a
couple different virtual functions, test before entering the
loop, and generate as many different versions of the loop as
necessary, with the corresponding virtual function inlined in
each case. I've yet to see, or even to hear of such a compiler,
however.)

> and the called function has license to change anything,
> including a private member vector's state. For most practical
> cases even if you use const like crazy.

> And there are many similar 'blockers'. That is why I don't
> get your claim for 'rare', my educated guess would calculate
> with at least one extra fetch somewhere -- and call its
> omissin a case of good luck.

It would be interesting to see some statistics on real code. As
I said, in the code I've seen, arrays are used because the code
will loop over the elements. Which means that the compiler has
an easy job of optimizing.

> > But is it realistic to have a virtual function which just
> > makes one access to a single element?

> Err, I think I lost you, as I got it originally the function
> may do aything, and it is arbitrary.

There's a difference between what a function *may* do (legally,
according to the language), and what it is likely to do in
practice. How often do you have, in practice, virtual functions
which access a single array element?

> > That probably depends on the application domain, but I've
> > never seen it; whenever I'm using an array, I'm accessing a
> > large number of elements locally, which means that the
> > compiler will have hoisted the pointer to the array data
> > into a register. Regardless of where that pointer comes
> > from.)

> IMO we all agreed that it *can* be hoisted.

In which case, the original statement is false.

> Also, that even if it isn't the overall speed difference is
> likely amortized.

In sum, you agree with me. You will not *always* see a
performance loss.

[...]

> Yet, denying away the difference is IMO not fair -- the users
> are better aware of the difference.

Presumably, the output of the profiler should indicate the
difference, if there is one and it is in a critical path.

[...]

> Claiming the difference as 'bullshit' is different.

What is "bullshit" is the statement that there will always be a
difference in performance, in favor of C style arrays. In
practice, it depends.

--
James Kanze

James Kanze

unread,

Jan 4, 2010, 5:52:02 PM1/4/10

to

On Jan 4, 1:18 pm, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:

> On 2 Jan, 19:04, James Kanze <james.ka...@gmail.com> wrote:

> > On Jan 2, 2:49 pm, "Leigh Johnston" <le...@i42.co.uk> wrote:
> > > Accessing a vector's contents via operator[] will always be
> > > slightly slower than accessing a local (stack based) array via
> > > [] as you have to dereference a pointer with the vector
> > > version.

> > That's bullshit. You have to dereference a pointer in both
> > cases. In the case of indexing, you have to calculate the
> > address (both for stack based and for std::vector), and
> > because these calculations are hidden in some functions with
> > std::vector (and probably do bounds checking as well),

> I thought operator[] didn't do bounds checking (or not
> usually) whilst at() did.

It's a quality of implementation issue. On most platforms,
operator[] will lead to an assertion failure, or something
similar, if there is a bounds error. But this is not required,
and all of the implementations I know of have a means of turning
this off.

--
James Kanze

Leigh Johnston

unread,

Jan 4, 2010, 5:55:45 PM1/4/10

to

>
>> volatile is not relevant to this discussion, it was simply
>> used to illustrate that *at least one* extra pointer
>> dereference is required even if this is subsequently cached in
>> a register. VC++ would optimize both assignments completely
>> away without it.
>
> Volatile is very relevant, because anything which affects
> optimization is very relevant when you start speaking of
> generated code.

If you look elsewhere in this thread you will see another example without
the use of volatile which still generates an extra instruction for the
std::vector pointer dereference.

>
>> You *cannot* avoid *at least one* extra pointer dereference
>> when using std::vector compared to a stack based container or
>> array.
>
> Never the less, most compilers do, most of the time.

Not true, there must be at least one pointer dereference which may then be
cached in a register in subsequent code.

>
>> BTW I am not eschewing the use of std::vector due to an
>> optimization gain that varies wildly depending on use-case, I
>> hardly use stack based arrays now, I use std::vector a lot or
>> my own stack based container.
>
> Certainly. I understand that. I still think your basic precept
> is wrong. Compilers are not at all as dumb as you seem to
> think.

My basic precept is not wrong, what I originally said translates to there
being a pointer dereference which can then be cached in a register before a
loop (or other code). No such pointer dereference is required for an array
on the stack accessed locally within the same scope (not passed to a
function). The original post of this thread had all the code in the same
function: main().

/Leigh

Yannick Tremblay

unread,

Jan 5, 2010, 5:56:33 AM1/5/10

to

In article <84qdnS7bTIaE8d_W...@giganews.com>,

Leigh Johnston <le...@i42.co.uk> wrote:
>>
>> For this example, with this compiler, on this machine. In
>> practice, I've had similar results with Sun CC on a Sparc, so
>> your measurements aren't unique. But in this case, you're
>> measuring the time necessary to grow the array, which is not the
>> same thing as the time necessary for an access.
>>
>> If you'll remember, your original statement was: "Accessing a
>> vector's contents via operator[] will always be slightly slower
>> than accessing a local (stack based) array via [] as you have to
>> dereference a pointer with the vector version." That's the only
>> statement I'm disagreeing with, and in that statement, what I'm
>> really disagreeing with is the *always*. Because in most cases,
>> I think that the "dereference a pointer" simply won't be present
>> in the individual accesses. Or it will be present with the
>> "stack based array", because you'll have passed it to a
>> function. Your statement simply doesn't correspond to the
>> reality in actual programs.
>>
>
>This discussion is actually about general vector vs array performance.

Leigh, I am sorry but you clearly made the statement yourself
specifically about the operator[] of std::vector and then used that to
promote the code hosted on your own website...

You have valid point that under some circumstance stack array can
have a performance advantage over std::vector but that was not what
your over the top statement said.

>I think I have shown that there *are* performance benefits to a array on the
>stack or other stack based container vs std::vector.

You have shown that for some artificial custom designed benchmarks
that are not related to your initial claim, your stack based container
outperform a std::vector.

If you are interested in more complex and more thourough benchmarks,
have a look at Stepanov writings:
http://www.stepanovpapers.com/
For example:
http://www.stepanovpapers.com/container_benchmark.cpp

Obviously, it can be argued that these particular banchmarks have
been written by someone who had a vested interest in demonstrating
that std::vector is performant, hence in a similar way is probably
biased. However, these articles and benchmarks have been published
and thouroughly peer reviewed so I would suggest that at the minimum,
the amount of bias is somewhat more limited.

>The locality of doing
>things on the stack compared to jumping to the free store are obvious.
>std::vector has a pointer to its elements, stack based container does not.
>I have shown this to be true in an actual program in actual reality. I did
>not make up those benchmarks.

Depends on the circumstances... It is possible (maybe even probable)
that a small automatic array on the local stack would be faster than
using a std::vector. However, you are neglecting a lot of thing that
can happen in real programs that do actual work. One should always be
careful with artifical benchmark. But simplifying to one particular
instruction and attempting to measure it independently, you are not
measuring what happens in an actual running system. Don't forget that
the observer effect also exist in computing...

In a real program:
- The free store might be in the cache
- The pointer to the free store might be in a register
- The code might be inlined
- Having large arrays on the stack might decrease locality of other
data therefore having overall negative effect on the performance
- Stack space might be more limited than free store space
- If doing real work, a stack based array might be passed by
reference/pointer to a function (unless your are a fan or 1000 lines functions)
- etc

>However I feel you are too blinkered for this discussion to continue any
>further in a meaningful way. "There is no spoon" "There is not pointer".

Pot, kettle, black...

I don't think anyone is disagreeing with statements that stack based
arrays can have performance benefits over std::vector. What some
peoples are disagreeing with are the absolute statements. Also that
any such performance benefit would be significant in a real program.

Regargs

Yannick

Leigh Johnston

unread,

Jan 5, 2010, 7:18:29 AM1/5/10

to

>
> Leigh, I am sorry but you clearly made the statement yourself
> specifically about the operator[] of std::vector and then used that to
> promote the code hosted on your own website...
>
> You have valid point that under some circumstance stack array can
> have a performance advantage over std::vector but that was not what
> your over the top statement said.
>

I also stated in my original post that the dereference can be optimized to
be only done once, i.e. put into a register.

/Leigh

Yannick Tremblay

unread,

Jan 5, 2010, 7:38:38 AM1/5/10

to

In article <EtKdnc1YleZWC93W...@giganews.com>,

Leigh Johnston <le...@i42.co.uk> wrote:
>We can come up with examples and counter-examples all day long.
>

Indeed, so let's have a little modification to only consider operator[] since that is what your original statement referred to

>Consider this:
>
>int main()
>{
> std::vector<int> v;
> v.reserve(1000);
> lib::vecarray<int, 1000> va;
> {
> timer t("vector: ");
> for (int i = 0; i != 1000000; ++i)
> {
> for (int j = 0; j != 1000; ++j)
> v.push_back(j);
> v.clear();
> }
> }
> {
> timer t("vecarray: ");
> for (int i = 0; i != 1000000; ++i)
> {
> for (int j = 0; j != 1000; ++j)
> va.push_back(j);
> va.clear();
> }
> }
>}

So let's make a few mods to prefill the vectors since the claim
referred to operator[] not push_back() nor clear()

#include <vector>
#include "vecarray.h"
#include <boost/progress.hpp>
#include <iostream>

size_t const VSIZE = 100000;
size_t const LOOP = 100000;

int main()
{
lib::vecarray<int, VSIZE> va;
va.resize(VSIZE);
std::vector<int> v;
v.resize(VSIZE);
{
std::cout << "vector: ";
boost::progress_timer t;
for (size_t i = 0; i < LOOP; ++i)
{
for (size_t j = 0; j < v.size(); ++j)
v[j] = j;
}
}
{
std::cout << "vecarray: ";
boost::progress_timer t;
for (size_t i = 0; i < LOOP; ++i)
{
for (size_t j = 0; j < va.size() ; ++j)
va[j] = j;
}
}
}

Compiled with g++ -std=gnu++0x -O3

vector: 4.70 s
vecarray: 5.72 s

Over a few runs. The vector varies from 4.69s to 4.71s but the
vecarray varies greatly from 5.14s to 6.09s

Hmm, interesting.... Care to explain what's happening and why is the
vecarray now slower than std::vector if the statement the operator[]
for a vector will always be slower than for a vecarray is true?

I am sure this benchmark has flaws but it is not obviously more flawed
than the previously posted ones. Neither are particularly realistic
but they show opposite and one of then shows what is arguably against
intuition...

>Which outputs (several runs done, VC++ release build, secure STL disabled):
>
>vector: 2.9741 seconds
>vecarray: 2.6782 seconds
>
>Which indicates (for this example) that a stack based container
>(http://i42.co.uk/stuff/vecarray.htm) is 11% quicker than std::vector.
>
>1) std::vector contains a pointer to its elements, a stack based container
>(or an ordinary array on the stack) does not
>2) you may benefit from the locality of your container and other local
>variables (think CPU cache)
>
>Have you seen the film "The Matrix"? "There is no spoon." "There is no
>pointer.".

So how would The Matrix explain the faster std::vector then? Is there
a fork somewhere?

Regards

Yannick

Leigh Johnston

unread,

Jan 5, 2010, 7:44:08 AM1/5/10

to

Those benchmarks do not take into account stack locality as the array
version is not of the local stack based variety and is accessed with an
extra indirection (pointer passed to function) and not accessed locally.

You mention my website, well on it it actually says "Unlike a std::vector
which has to store a pointer to element memory (to support reallocation) and
dereference that pointer each time an operation such as push_back() is
called the memory for vecarray's elements is from an array which is a member
of the vecarray itself. Unlike a std::vector a vecarray (and its elements)
can be placed easily on the stack allowing quick stack frame based element
access."

I should have just copied this into my original post rather than
concentrating on operator[] however I did not wish to promote my container
hence my rushed Usenet posting; I only mentioned it later on as it is easy
to benchmark vs std::vector for standard container operations. VC++ seems
to have an aggressive optimizer and yet I have shown that a stack based
container can outperform std::vector for certain operations which was what
my original claim attempted to convey. Saying this is not the case for
"real" programs that do "actual work" is just plain nonsense. What the hell
is a "real" program supposed to be? A "real" program might use iterators
with <alogirthm> for example in which case the performance gains may be lost
but only part of such a real program will be using iterators, other parts
will not therefore it is a nonsense claim.

/Leigh

Leigh Johnston

unread,

Jan 5, 2010, 8:02:10 AM1/5/10

to

For some reason the optimizer is not optimizing out the call to size() for
vecarray which is interesting.

If I change the loops to

for (size_t j = 0; j < VSIZE; ++j)
v[j] = j;

I get the following:

vector: 5.3602 seconds
vecarray: 5.2557 seconds

Examples and counter-examples.

"There is no spoon" "There is no pointer"

Leigh Johnston

unread,

Jan 5, 2010, 8:08:16 AM1/5/10

to

Intuitively it is correct for the optimizer to call size() each time as the
non-const version of operator[] should be being called, why this is only
happening for vecarray and not vector I do not know.

/Leigh

Leigh Johnston

unread,

Jan 5, 2010, 8:19:25 AM1/5/10

to

Of course I am aware that the optimizer can look past constness if the
functions are inlined.

Bo Persson

unread,

Jan 5, 2010, 8:23:06 AM1/5/10

to

A wild guess would be alias analysis.

In vecarray we have some nifty reinterpret_casts and accessing memory
through an array of char. We know that chars are sometimes allowed to
alias other objects.

In the std::vector, we have a pointer to an int buffer and a size_t,
which cannot possibly overlap.

Or it could be something completely different. :-)

Bo Persson

Yannick Tremblay

unread,

Jan 5, 2010, 10:13:56 AM1/5/10

to

In article <xsKdnfOpwfzOpd7W...@giganews.com>,

Leigh Johnston <le...@i42.co.uk> wrote:
>For some reason the optimizer is not optimizing out the call to size() for
>vecarray which is interesting.
>
>If I change the loops to
>
>for (size_t j = 0; j < VSIZE; ++j)
> v[j] = j;
>
>I get the following:
>
>vector: 5.3602 seconds
>vecarray: 5.2557 seconds

Same here. It improves vecarray. I now get 4.70s for both.
Seems like my std::vector is more performant than yours :-)

Interestingly, if I increase the size of the array, this make vecarray
lag behind again. The results also become less stable but this would
seem to show that large vecarray on the stack are less performant than
std::vectors :

vector: 8.15 s
vecarray: 8.39 s

vector: 8.21 s
vecarray: 8.55 s

vector: 7.71 s
vecarray: 8.39 s

vector: 7.45 s
vecarray: 8.18 s

If I reduce the size of the array and increase LOOP, the reverse
happens and vecarray becoems a bit more performant.

size_t const VSIZE = 1000000;
size_t const LOOP = 10000;

int main()
{
lib::vecarray<int, VSIZE> va;
va.resize(VSIZE);
std::vector<int> v;
v.resize(VSIZE);

{
std::cout << "vector: ";
boost::progress_timer t;
for (size_t i = 0; i < LOOP; ++i)
{

for (size_t j = 0; j < VSIZE; ++j)
v[j] = j;
}
}

{
std::cout << "vecarray: ";
boost::progress_timer t;
for (size_t i = 0; i < LOOP; ++i)
{

for (size_t j = 0; j < VSIZE ; ++j)

va[j] = j;

Leigh Johnston

unread,

Jan 5, 2010, 10:32:10 AM1/5/10

to

I would expect them to perform about the same for this use-case as I said in
my O.P. that the pointer dereference of operator[] can be optimized to be
only done once (cached in a register) before a loop.

It is annoying and strange that size() is not being optimized out though in
both compilers VC++ and g++.

/Leigh

Balog Pal

unread,

Jan 5, 2010, 11:44:20 AM1/5/10

to

"Leigh Johnston" <le...@i42.co.uk>

Just to try Bo's therory, can you test what happens if you simply define the
data holder as T[N] without the aligning union, and remove the
reinterpret_casts?

And if no change, make operator[] not call begin()?

Leigh Johnston

unread,

Jan 5, 2010, 11:44:55 AM1/5/10

to

"Balog Pal" <pa...@lib.hu> wrote in message
news:hhvq1q$2db9$1...@news.ett.com.ua...

The begin() is being inlined/optimized away ok, there is not much point in
having T[N]: may as well just try std::tr1::array instead, the point of
vecarray is to have a variable (max) size container that can be place on the
stack. :)

/Leigh

Leigh Johnston

unread,

Jan 5, 2010, 12:14:20 PM1/5/10

to

I tried T[N], removing the casts and call to begin() and interestingly the
size() is still not optimized out.