I'd be intersted if any other vector implemenations uses a growth factor
other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
don't have that compiler here).
Thanks,
Scott
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
> I've been doing other stuff lately, so this may be old news (though I
> wasn't able to find any mention of this via Google), but I just noticed
> that the growth factor for vector in the MS7.1 implementation seems to be
> 1.5 instead of the more traditional 2 (as I believe it was in VC6).  My
> other implementations (g++ 3.2 mingw and Comeau 4.3.3) continue to grow by
> a factor of 2 when a push_back exceeds capacity.
>
> I'd be intersted if any other vector implemenations uses a growth factor
> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
> don't have that compiler here).
VC++ V7.0 also uses 1.5. We settled on that factor as a better balance
between memory utilization and expansion efficiency.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
See this thread:
<http://groups.google.com/groups?threadm=3b6805d7%241%40primark.com>
> (as I believe it was in VC6).
You remember correctly.
> My other implementations (g++ 3.2 mingw and Comeau 4.3.3) continue to
> grow by a factor of 2 when a push_back exceeds capacity.
> 
> I'd be intersted if any other vector implemenations uses a growth
> factor other than 2, and I'd also like to know whether VC7 uses 1.5
> or 2 (since I don't have that compiler here).
VC++ 7 uses a factor of 1.5.
In article MPG.1a11d4f0a...@news.hevanet.com, Scott Meyers at
Use...@aristeia.com wrote on 11/5/03 4:03 AM:
> I've been doing other stuff lately, so this may be old news (though I
> wasn't able to find any mention of this via Google), but I just noticed
> that the growth factor for vector in the MS7.1 implementation seems to be
> 1.5 instead of the more traditional 2 (as I believe it was in VC6).  My
> other implementations (g++ 3.2 mingw and Comeau 4.3.3) continue to grow by
> a factor of 2 when a push_back exceeds capacity.
> 
> I'd be intersted if any other vector implemenations uses a growth factor
> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
> don't have that compiler here).
> 
> Thanks,
> 
> Scott
-- 
Glenn P. Downing
The University of Texas at Austin
Department of Computer Sciences
Taylor Hall 2.124
Austin, TX 78712
(512) 471-7316
http://www.cs.utexas.edu/users/downing/
Scott Meyers schrieb:
> I've been doing other stuff lately, so this may be old news (though I
> wasn't able to find any mention of this via Google), but I just noticed
> that the growth factor for vector in the MS7.1 implementation seems to be
> 1.5 instead of the more traditional 2 (as I believe it was in VC6).  My
> other implementations (g++ 3.2 mingw and Comeau 4.3.3) continue to grow by
> a factor of 2 when a push_back exceeds capacity.
>
> I'd be intersted if any other vector implemenations uses a growth factor
> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
> don't have that compiler here).
As far as I remember, a factor near 1.5 results from an analysis from Andrew
Koenig (?), which takes several general aspects of allocation processes
into account. Regrettably I forgot the quote from the corresponding article,
but
surely there will remember another one.
Greetings from Bremen,
Daniel
VC6 indeed seems to have a factor of 2.
>                                                                      My
> other implementations (g++ 3.2 mingw and Comeau 4.3.3) continue to grow by
> a factor of 2 when a push_back exceeds capacity.
>
> I'd be intersted if any other vector implemenations uses a growth factor
> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
> don't have that compiler here).
  Besides VC 6/7.1 I have access to Metrowerks'
  MSL (CW8) and STLport (4.5).
  Both use a factor of 2.
> Thanks,
>
> Scott
Schobi
-- 
Spam...@gmx.de is never read
I'm Schobi at suespammers org
"And why should I know better by now/When I'm old enough not to?"
  Beth Orton
There is a technical reason to prefer 1.5 to 2 -- more specifically, to
prefer values less than 1+sqrt(5)/2.
Suppose you are using a first-fit memory allocator, and you're progressively
appending to a vector.  Then each time you reallocate, you allocate new
memory, copy the elements, then free the old memory.  That leaves a gap, and
it would be nice to be able to use that memory eventually.  If the vector
grows too rapidly, it will always be too big for the available memory.
It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
will always be too big for the hole that has been left sofar; if it is
<1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small enough to
allow the memory to be recycled.
VC7 uses 1.5. I wonder why. This reduces portability, since 2 is
probably the most common (and well-known) factor. It also changes the
performance characteristics of code written in older times.
It's hard to tell which factor is best in the absolute sense, so I'd
prefer that to be a parameter. Just changing a hard-coded number
doesn't seem a good solution to me.
- Vahan
 > There is a technical reason to prefer 1.5 to 2 -- more specifically, to
 > prefer values less than 1+sqrt(5)/2.
Since 1+sqrt(5)/2 > 2, correcting that typo gives (1+sqrt(5))/2.
John
Sorry, I meant (1+sqrt(5))/2.
Didn't you write an article on this one time?  If so, can you please post a reference and, if
available, a URL?
Thanks,
Scott
It's hard for me to see this as "reducing portability".  There is nothing in the
Standard that requires or even implies that std::vectors should reallocate in
multiples of 2.  If client code is relying on std::vector growing by multiples
of 2, then I would say that the client code is reducing portability, not VC.
>> I'd be intersted if any other vector implemenations uses a growth factor
>> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since
>> I
>> don't have that compiler here).
>
> There is a technical reason to prefer 1.5 to 2 -- more specifically, to
> prefer values less than 1+sqrt(5)/2.
>
> Suppose you are using a first-fit memory allocator, and you're
> progressively
> appending to a vector.  Then each time you reallocate, you allocate new
> memory, copy the elements, then free the old memory.  That leaves a gap,
> and
> it would be nice to be able to use that memory eventually.  If the vector
> grows too rapidly, it will always be too big for the available memory.
>
> It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
> will always be too big for the hole that has been left sofar; if it is
> <1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small enough
> to
> allow the memory to be recycled.
 Yes,small correction, less then (1+sqrt(5))/2 that is less then 1.6~1.7
-- 
  grzegorz
I fail to see how an implementation detail such as the growth factor can 
possibly effect portability.
> It also changes the
>performance characteristics of code written in older times.
Yes, but that is not what we mean by portability. There are many aspects 
of performance that change with new releases of compilers.  Actually the 
amount and types of available memory influences performance in a far 
more dramatic way. The existence of large cache memory actually means 
that vector<T> will often out perform list<T> for insertions even though 
list<T> is theoretically much superior when uniform memory is used.
The innovative feature of the C++ Standard Library was the provision of 
certain performance guarantees.
>
>It's hard to tell which factor is best in the absolute sense, so I'd
>prefer that to be a parameter. Just changing a hard-coded number
>doesn't seem a good solution to me.
Possibly, but hard coded values do allow certain optimisations and in 
another post Andy Koenig points out the a growth factor of 1.5 allows a 
memory usage optimisation that is not available to systems using a 
factor of 2.0
-- 
Francis Glassborow      ACCU
If you are not using up-to-date virus protection you should not be reading
this. Viruses do not just hurt the infected but the whole community.
Andrew Koenig schrieb:
> There is a technical reason to prefer 1.5 to 2 -- more specifically, to
> prefer values less than 1+sqrt(5)/2.
>
> Suppose you are using a first-fit memory allocator, and you're progressively
> appending to a vector.  Then each time you reallocate, you allocate new
> memory, copy the elements, then free the old memory.  That leaves a gap, and
> it would be nice to be able to use that memory eventually.  If the vector
> grows too rapidly, it will always be too big for the available memory.
>
> It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
> will always be too big for the hole that has been left sofar; if it is
> <1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small enough to
> allow the memory to be recycled.
Hello Andrew Koenig,
so my remembrance took me right! Is it correct, that you wrote an article about
the
proof of this dependency? Can you please give me quote for it?
Thank you very much!
Greetings from Bremen,
Daniel Spangenberg
> > I'd be intersted if any other vector implemenations uses a growth factor
> > other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since
I
> > don't have that compiler here).
>
> There is a technical reason to prefer 1.5 to 2 -- more specifically, to
> prefer values less than 1+sqrt(5)/2.
But 1+sqrt(5)/2 = 2.118...  The OP's question is about 1.5 versus 2.0, and
both are less than 1+sqrt(5)/2.
> It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
> will always be too big for the hole that has been left sofar; if it is
> <1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small enough
to
> allow the memory to be recycled.
Also, where did the sqrt(5) come from?
--
+++++++++++
Siemel Naran
> > I'd be intersted if any other vector implemenations uses a growth factor
> > other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since
I
> > don't have that compiler here).
>
> VC7 uses 1.5. I wonder why. This reduces portability, since 2 is
> probably the most common (and well-known) factor. It also changes the
> performance characteristics of code written in older times.
The factor 1.5 or 2.0 is an implementation detail and should not affected
the portability of the code in the sense that the code should compile and
run as expected on all platforms.  It may affect performance characteristics
as you mention.  Members of this newsgroup say the 1.5 method leads to
better memory usage.  Of course, use vector::reserve whenever you can.
> It's hard to tell which factor is best in the absolute sense, so I'd
> prefer that to be a parameter. Just changing a hard-coded number
> doesn't seem a good solution to me.
Implementations may add an additional template parameter to std::vector.
Here's an attempt
   template <class T, class Allocator=std::allocator<T>, double mult=1.5>
   class vector;
But non-template parameters must be integral constants. So how to do it?
Second, how do we multiply the size by 1.5?  Do you divide by 2 then
multiply by 3, or what?
--
+++++++++++
Siemel Naran
I cannot see how a groing factor could impact portability. I think that you
should give your definition of portability, because it doesn't seem to fit
its usual meaning in C++...
> It also changes the
> performance characteristics of code written in older times.
Well, if you got critical counter-performance by using the automatic
allocation scheme of vectors, i am not sure any growth factor is a correct
answer.
However, your second assertion is somehow in contradiction with your first
one. If you talk about portability, you cannot assume that the allocation
scheme is equally performing on different systems, even if the growth factor
is the same...
Chris
You wrote:
> I'd be intersted if any other vector implemenations uses a growth factor
> other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since I
> don't have that compiler here).
IBM's VisualAge C++ 5.0.2.2 on AIX also seems to use a growth
factor of 1.5. A small test program which push_back()s elements
onto an (initially empty) vector yields the following sequence
of capacities:
 size  capacity
    1     1
    2     2
    3     3
    4     4
    5     6
    7     9
   10    13
   14    19
   20    28
   29    42
   43    63
   64    94
   95   141
  142   211
  212   316
  317   474
  475   711
  712  1066
 1067  1599
 1600  2398
 2399  3597
 3598  5395
 5396  8092
 8093 12138
(size indicates which vector size caused a change in the
capacity).
Kind regards
    Ingolf
-- 
Ingolf Steinbach                       Jena-Optronik GmbH
ingolf.s...@jena-optronik.de       ++49 3641 200-147
PGP: 0x7B3B5661  213C 828E 0C92 16B5  05D0 4D5B A324 EC04
> Scott Meyers <Use...@aristeia.com> wrote in message
> news:<MPG.1a11d4f0a...@news.hevanet.com>...
>> I'd be intersted if any other vector implemenations uses a growth
>> factor other than 2, and I'd also like to know whether VC7 uses 1.5
>> or 2 (since I don't have that compiler here).
>> 
>
> VC7 uses 1.5. I wonder why. This reduces portability, since 2 is
> probably the most common (and well-known) factor. It also changes
> the performance characteristics of code written in older times.
>
> It's hard to tell which factor is best in the absolute sense, so I'd
> prefer that to be a parameter. Just changing a hard-coded number
> doesn't seem a good solution to me.
Isn't that what the reserve() member function is for? Since the growth
factor is not specified by the standard, and probably not even
documented for your platform, relying on any particular growth factor
is inherently non-portable. Using reserve() you can guarantee (at
least) enough memory for a given number of elements in a fully
portable manner. Otherwise, you can just rely on a quality
implementation doing something reasonable.
-- 
Raoul Gough.
(setq dabbrev-case-fold-search nil)
Surely, if the growth factor is >= 2 the new memory will always
be too big for the hole that has been left sofar; if it is < 2,
the new memory will eventually fit.
Presumably the reason for (1+sqrt(5))/2 is...
Initial allocation is s.
The first resize is k*s.
The second resize is k*k*s, which'll fit the hole
iff k*k*s <= k*s+s  I.e. iff k <= (1+sqrt(5))/2
...the hole can be recycled asap.
It could, by storing its previous size, grow fibonaccily.
Louis.
I calculated 1+sqrt(5)/2 as 2.118
Did you mean (1+sqrt(5))/2 ? 
(That's about 1.618)
Michael
> Also, where did the sqrt(5) come from?
From the relevant root of the equation k^2 - k - 1 = 0. ;-)
-Howard
> But 1+sqrt(5)/2 = 2.118...  The OP's question is about 1.5 versus 2.0, and
> both are less than 1+sqrt(5)/2.
Actually, (1+sqrt(5))/2. The golden section.
> Also, where did the sqrt(5) come from?
From finding a ratio a:b such that
a:b = (a+b):a
This leads to a quadratic equation whose solution is the prominent
golden section. (That is also related to the fibunacci sequence, btw.)
Greetings,
        Thomas
Does it matter ?
(i don't think you are talking about some performance issue, because the
allocation and copy is most likely to consume more than 99% of the time that
the function will be running)
Chris
Andrew Koenig means Phi, the Golden Ratio, the ratio of 2 successive terms
that the Fibonacchi series converges to  1,2,3,5,8,13,21,34,55,...
It is also the ratio of a long diagonal of a regular pentagon to its edge.
Phi is (sqrt(5.0)+1.0)/2.0
It has some almost magical properties
1/ Phi     = 0.6180339887498948482045868343656...
Phi          = 1.6180339887498948482045868343656...
Phi * Phi = 2.6180339887498948482045868343656...
and in general
pow(Phi, n+1) = pow(Phi, n) + pow(Phi, n-1)
Stephen Howe
i += i>>1;
-- 
Francis Glassborow      ACCU
If you are not using up-to-date virus protection you should not be reading
this. Viruses do not just hurt the infected but the whole community.
> Also, where did the sqrt(5) come from?
It is (1 + sqrt(5))/2 which is the golgen ratio that answers
all questions.  :)
The solution to x^2 - x - 1 = 0 which is the equation that you
get when you ask for the memory freed by prior increases to sum
to the amount needed for the next one.
John
> Surely, if the growth factor is >= 2 the new memory will always
> be too big for the hole that has been left sofar;
Yes.
> if it is < 2, the new memory will eventually fit.
No.
Try 1.8.
size  free
0     0
1     0
2     0
3     1
5     3
9     6
16    11
Please continue the exercise until your "eventually" happens.
John
Siemel Naran schrieb:
> "Andrew Koenig" <a...@acm.org> wrote in message news:FUbqb.206398
>
> > > I'd be intersted if any other vector implemenations uses a growth factor
> > > other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since
> I
> > > don't have that compiler here).
> >
> > There is a technical reason to prefer 1.5 to 2 -- more specifically, to
> > prefer values less than 1+sqrt(5)/2.
>
> But 1+sqrt(5)/2 = 2.118...  The OP's question is about 1.5 versus 2.0, and
> both are less than 1+sqrt(5)/2.
>
> > It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
> > will always be too big for the hole that has been left sofar; if it is
> > <1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small enough
> to
> > allow the memory to be recycled.
>
> Also, where did the sqrt(5) come from?
I am quite sure, he meant (1+sqrt(5))/2, which is also known as the Golden
ratio.
This number often occurs during analysis of repeated divisions...
Daniel
[snip
> > It turns out that if the growth factor is >= 1+sqrt(5)/2, the new memory
> > will always be too big for the hole that has been left sofar; if it is
> > <1+sqrt(5)/2, the new memory will eventually fit.  So 1.5 is small
enough
> to
> > allow the memory to be recycled.
>
> Surely, if the growth factor is >= 2 the new memory will always
> be too big for the hole that has been left sofar; if it is < 2,
> the new memory will eventually fit.
>
> Presumably the reason for (1+sqrt(5))/2 is...
>
> Initial allocation is s.
> The first resize is k*s.
> The second resize is k*k*s, which'll fit the hole
> iff k*k*s <= k*s+s  I.e. iff k <= (1+sqrt(5))/2
>
> ...the hole can be recycled asap.
Ah! The hole still contains the current data :-(
So it's as you say, k must be < (1+sqrt(5))/2 for
an eventual fit.
Louis.
P.S. Little point in k < (0.5+sqrt(69)/18)^1/3+(0.5-sqrt(69)/18)^1/3
     (which is its recylce asap value).
The calculation involves a bit of somewhat non-trivial math,
but the theory seems to be the following (and I'm just
guessing since I haven't seen the article):
Let the increment ratio be 'x'.  Subsequent allocations
are therefore x, x^2, x^3, x^4, ... (times the size of the
initial allocation, but that doesn't affect the ratios).
When the block of size 'x^n' becomes full, a new one
of size 'x^(n+1)' needs to be allocated.  It would be good
if this block fit into the size of already de-allocated blocks.
Because block 'x^n' must be present for copying stuff into
the new block, the sum of the sizes of the previous blocks
(x, x^2, x^3, ..., x^(n-1) ) should be less than or equal to
the size of the next block.  x^(n+1) .  This leads to a formula
x^= x+1, to which (1+sqrt(5))/2 is the positive solution.
This is a nice theory, but it neglects a number of factors:
 - Administrative data of the memory allocator is a constant
    added to every allocation
 - Filling the vector may need to allocate memory, ruining the
    idea that the "next" block always fits to what is left from
    the previous allocations
 - This scheme pays off only once.  It is based on the idea
    that *all* of the past deallocations accumulate a memory
    block that is enough to contain a "next" allocation.  When
    this has been done once, the past deallocations block
    can never again contain *all* of the past deallocations.
To fight some of the drawbacks, the ratio is lowered from
(1+sqrt(5))/2 ~1.618 to, say - like in the topical case - 1.5 .
Some drawbacks still remain.
Cheers!
- Risto -
> Presumably the reason for (1+sqrt(5))/2 is...
>
> Initial allocation is s.
> The first resize is k*s.
> The second resize is k*k*s, which'll fit the hole
> iff k*k*s <= k*s+s  I.e. iff k <= (1+sqrt(5))/2
Dumb question. What hole?
--
+++++++++++
Siemel Naran
Louis Lavery wrote:
> Presumably the reason for (1+sqrt(5))/2 is...
> 
> Initial allocation is s.
> The first resize is k*s.
> The second resize is k*k*s, which'll fit the hole
> iff k*k*s <= k*s+s  I.e. iff k <= (1+sqrt(5))/2
> 
> ...the hole can be recycled asap.
It's not that easy.
Note that in order to grow the vector, you need to *first* allocate the 
new memory, copy the elements and *after that* you can free the old block.
This means that the condition for recycling is not what you wrote, 
because in order to allocate k*k*s you still need the k*s still in 
place, so you cannot count it in the gap.
The gap will accumulate and after few reallocations the gap will be big 
enough to accommodate the new vector, but it is not the second 
allocation, no matter what is the growing factor.
The condition for memory reuse is this:
(k^n)*s <= Sum(i=0, n-2, (k^i)*s)
For example, in order to reuse the gap in fourth growing step, we have:
k*k*k*k*s <= s + k*s + k*k*s
because the k*k*k*s is still needed while k*k*k*k*s is being allocated.
It is too early in the morning for me to bite the above as a 
mathematician, but as a programmer I was able to write this:
#include <iostream>
using namespace std;
int main()
{
     double k;
     cout << "enter the growing factor: ";
     cin >> k;
     double size = 1.0;
     double newsize;
     double gap = 0.0;
     int steps = 1;
     while ((newsize = k * size) > gap)
     {
         gap += size;
         size = newsize;
         ++steps;
     }
     cout << "the gap is reused after " << steps
         << " steps, with size " << newsize
         << " bigger than initially\n";
}
The growing factor 1.5 guarantees that the gap will be reused in 5th 
step, growing the vector ~7.6 times (the accumulated gap will be ~8*s).
The factor 1.6 reuses the gap in 8th step, growing vector 42.9 times.
The factor 1.7 overflows, meaning that the value between 1.6 and 1.7 
(presumably (1+sqrt(2))/2) is indeed the critical one, but in the 
asymptotic way, not as the guarantee of instant reuse.
> It could, by storing its previous size, grow fibonaccily.
Interesting, but it would not be able to meet the "amortized constant 
cost of unit grow" requirement. This requirement boils down to constant 
factors.
-- 
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/
My definition of portability (admittedly not necessarily universally
accepted) is the inverse of the effort it takes to take the code to
another platform and make it work. This includes the performance: if
the code compiles on a different platform but performs noticeably
worse, we usually don't consider porting complete.
> Yes, but that is not what we mean by portability. There are many aspects 
> of performance that change with new releases of compilers.  Actually the 
> amount and types of available memory influences performance in a far 
> more dramatic way. The existence of large cache memory actually means 
> that vector<T> will often out perform list<T> for insertions even though 
> list<T> is theoretically much superior when uniform memory is used.
I agree that there are many factors beyond our control that affect
portability of performance of our code. File system operations may
have different complexities under different OS's, compilers are
different, etc. In some casese those things force rewrites (as in the
case you've mentioned). That's the inevitable evil. However, in this
case:
a) This is something under our control, so why introduce an additional
problem? I for one was unpleasantly surprised when I found out that
the factor changes from platform to platform.
Why not use a common default and use an extra integer template
parameter that specifies the reallocation factor (in percents, for
instance)?
b) Containers and algorithms are very important determinants of
performance, if not the most important ones. I don't really care if
mkdir works slower or faster than CreateDirectory, that won't affect
the performance of my program. But I would like to be certain about
the characteristics of the containers, because the performance of
programs crucially depends on them.
Then of course I can write my own containers and not use STL. Yet, it
has been serving me well mostly, and I'd like to think of it as a
serious and usable library.
Some portability (my definition) issues that I've encountered in STL:
1. The std::list::size/splice performance. One has to be careful not
to use std::list's size() thoughtlessly, or write a wrapper, or
something. IOW, perform additional effort.
2. Hash tables. Unlike std::set, which, in my experience, behaves
pretty uniformly, hash performance differs significanly. There are
many implementation strategies, policies involved. I prefer to drag a
single implementation from platform to platform, rather than rely on
the platform-specific implementation. Btw, last I checked, the
proposal for hash-tables doesn't eliminate some of the choices (but I
may be outdated on this), and that's, in my view, a defect.
> The innovative feature of the C++ Standard Library was the provision of 
> certain performance guarantees.
What I suggest is that perhaps the guarantees should be even stricter.
> >It's hard to tell which factor is best in the absolute sense, so I'd
> >prefer that to be a parameter. Just changing a hard-coded number
> >doesn't seem a good solution to me.
> 
> Possibly, but hard coded values do allow certain optimisations and in 
> another post Andy Koenig points out the a growth factor of 1.5 allows a 
> memory usage optimisation that is not available to systems using a 
> factor of 2.0
It's not the specific factor that concerns me, it's my unawareness of
the factor when I move to another platform.
- Vahan
"Andrew Koenig" <a...@acm.org> wrote in message
news:FUbqb.206398$0v4.16...@bgtnsc04-news.ops.worldnet.att.net...
> > I'd be intersted if any other vector implemenations uses a growth factor
> > other than 2, and I'd also like to know whether VC7 uses 1.5 or 2 (since
I
> > don't have that compiler here).
>
> There is a technical reason to prefer 1.5 to 2 -- more specifically, to
> prefer values less than 1+sqrt(5)/2.
>
> Suppose you are using a first-fit memory allocator, and you're
progressively
> appending to a vector.  Then each time you reallocate, you allocate new
> memory, copy the elements, then free the old memory.  That leaves a gap,
and
> it would be nice to be able to use that memory eventually.  If the vector
> grows too rapidly, it will always be too big for the available memory.
>
> It turns out that if the growth factor is >= (1+sqrt(5))/2, the new memory
> will always be too big for the hole that has been left sofar; if it is
> <(1+sqrt(5))/2, the new memory will eventually fit. So 1.5 is small
enough to
> allow the memory to be recycled.
[Fixed the obvious but crucial typos in the quoted material]
This is a novel idea, and I understand the underlying theory.
However, according to my calculations, using growth factor
of 1.5 generates almost exactly twice as much copying than
using the growth factor 2 in the long run.
However, the difference in the amount of copying taking
place would allow a modification to the growth factor 2
algorithm which, after copying stuff from the old block to
the new block, would free the old block and allocate yet
another block the size of the new block, which will now
fit in the space vacated by the *old* block as well.  This
algorithm would perform as many copies as the algorithm
that uses growth factor 1.5 (despite the fact that it does
a redundant round of copying to compact the free store)
so it would still be at least as good, but it would have an
additional performance benefit of being able to postpone
the redundant copying until the space actually becomes
a concern.
I've said it in another article, but again... there are more
factors involved than a simplistic model can handle:
 - Constant allocation overhead in every block
 - Populating the grown vector may pollute free store with
    allocations that the straightforward calculation doesn't
    take into account (typically in a rate that grows at the
    same speed with the size of the vector).
 - Attempts to reuse too large of a portion of the already
    deallocated memory will render the algorithm useless
    after the first application of this optimization, because
    the algorithm *itself* will then have polluted the ideal
    free store.
I don't know... somebody should probably make a more
rigorous study of this topic - if not for anything else, it is
an interesting question.  Unfortunately I don't have time
for it myself right now.  (Or rigor, for that matter :-) ).
My bottom line is that it's debatable whether the growth
factor 1.5 yields any benefit against 2 .
Cheers!
- Risto -
That doesn't work because it can't free the memory for the k*s
elements before allocating the memory for the k*k*s elements.
However, at the (n+2)th allocation, all the memory for the 1st
to nth allocations is available.
I tried this out with a spreadsheet.  If k = sqrt(2) then a
"hole" of freed memory preceding the current allocation can be
used every 4 allocations (after some initial variability); if
k = 1.5 then every 5 allocations; if k = 1.618 then every 21
allocations; if k = (1+sqrt(5))/2 then never except for a few
initial successes due to rounding errors.
Nope.  The problem is that the new memory has to be allocated before the old
memory is deallocated, which means that even if the hole is big enough after
deallocation, it won't be before deallocation.
Here's an example.  Suppose, optimistically, that there is no overhead
associated with memory allocation.  Suppose also that you start with a
1-element vector and grow it by a factor of two each time.
Then you start out with 1 element.  The first time you want to grow, you
must allocate 2 elements, then free the 1, leaving a 1-element hole followed
by 2 elements.  Next time, you allocate 4, then free 2, leaving a 3-element
hole followed by 4 elements.  In general, after each allocation, you will
have an (n-1)-element hole followed by n elements.
So the question about whether the hole is ever large enough to reuse is not
whether (n-1)>=n, which is almost true, but whether (n-1)>=2n, which isn't
close to true.
[snip]...
> Presumably the reason for (1+sqrt(5))/2 is...
> 
> Initial allocation is s.
> The first resize is k*s.
> The second resize is k*k*s, which'll fit the hole
> iff k*k*s <= k*s+s  I.e. iff k <= (1+sqrt(5))/2
> 
> ...the hole can be recycled asap.
> 
If you take the upper bounds of the product, then every 6th allocation
starting form 1 as the initial size would recycle the previous memory
usinf a first fit allocation starategy, but most good allocators use some
sort of hybrid of all the allocation strategies, like dmalloc, which is
pretty effective!!!, and even I've written allocator policy, which is not
that straight forward. I guess that if the current vector is not the only
vector, or even the only container in the current program that's using the
heap, then the factor of 2 should do fine. There is no concrete proof I
guess to choose 1.5 over 2 apart form the fact that if (1). vector is the only
container around, and (2). the allocation starategy is first fit, then the
performance will get better. But the probability of 1 and 2 happening at
the same time is very small, so 2 seems to be fine, and so does 1.5 to me.
(Others might disagree). dmalloc and the default malloc on linux is pretty
efficient, and I guess it is VMM aware, so no matter how much you allocate
without reserve, the memory usage is only what is being used (capacity),
and not size. On the other hand dmalloc uses mmap, which is again a
directly VMM mapped memory access, so the virtual memory manager can
handle that. However, if I use the same driver program with my pool
allocator, then the memory usage shoots up by at least 50%. Not that the
allocator is inefficient, but the allocation starategy is such that this
kind of memory usage is not supported. What I use is a compination of
first fit, next fit, and previous fit algorithms + checks on what the size
of the last deallocated block was, depending on the current state of the
allocator, 1.5 will perform better on such an allocation scheme given that
the grow size of the pool is suitably adjusted. Also, after some threshold
has been reached, then this will tend to calling malloc for huge sizes.
> It could, by storing its previous size, grow fibonaccily.
That is one good solution!!!, but memory usage would be again a problem of
concern.
Regards,
-Dhruv.
Which leads me to wonder if using a Fibonacci series growth would 
satisfy the amortised constant time requirement of the C++ Standard for 
vector.
-- 
Francis Glassborow      ACCU
If you are not using up-to-date virus protection you should not be reading
this. Viruses do not just hurt the infected but the whole community.
In my code, nothing explicitly relies on the growing factor, it's just
ordinary vector manipulation. And neither does the performance
precalculation rely on it. But if and when I take the code to a
different platform and run benchmarks and some fail, I don't want it
to be because of a change of vector's factor. If something can be
equivalent across platforms, then I'd prefer it to be such, so that we
could concentrate porting efforts on real issues.
And it's a flaw of the Standard IMO that it doesn't mandate the
growing factor.
- Vahan
 > > I fail to see how an implementation detail such as the growth factor can
 > > possibly effect portability.
 >
 > My definition of portability (admittedly not necessarily universally
 > accepted) is the inverse of the effort it takes to take the code to
 > another platform and make it work. This includes the performance: if
 > the code compiles on a different platform but performs noticeably
 > worse, we usually don't consider porting complete.
I basically agree. I've long maintained that portability is a measure
of the cost of moving code vs. rewriting it. So have you *noticed* that
1.5 is unacceptably worse than 2 for a growth factor?
 > > Yes, but that is not what we mean by portability. There are many aspects
 > > of performance that change with new releases of compilers.  Actually the
 > > amount and types of available memory influences performance in a far
 > > more dramatic way. The existence of large cache memory actually means
 > > that vector<T> will often out perform list<T> for insertions even though
 > > list<T> is theoretically much superior when uniform memory is used.
 >
 > I agree that there are many factors beyond our control that affect
 > portability of performance of our code. File system operations may
 > have different complexities under different OS's, compilers are
 > different, etc. In some casese those things force rewrites (as in the
 > case you've mentioned). That's the inevitable evil. However, in this
 > case:
 >
 > a) This is something under our control, so why introduce an additional
 > problem? I for one was unpleasantly surprised when I found out that
 > the factor changes from platform to platform.
Maybe the problem also provides one or more *solutions*. Again, was your
unpleasant surprise the result of a performance measurement or just an
idle conjecture?
 > Why not use a common default and use an extra integer template
 > parameter that specifies the reallocation factor (in percents, for
 > instance)?
The C++ Standard doesn't let you add template parameters.
 > b) Containers and algorithms are very important determinants of
 > performance, if not the most important ones. I don't really care if
 > mkdir works slower or faster than CreateDirectory, that won't affect
 > the performance of my program. But I would like to be certain about
 > the characteristics of the containers, because the performance of
 > programs crucially depends on them.
And containers have tightly constrained *complexity* requirements.
Implementation details can often yield linear factors of two or more,
which swamp any difference between 1.5 and 2 growth rates.
 > Then of course I can write my own containers and not use STL. Yet, it
 > has been serving me well mostly, and I'd like to think of it as a
 > serious and usable library.
It is. Do you have any performance data that indicates you'd be better
off writing your own?
 > Some portability (my definition) issues that I've encountered in STL:
 >
 > 1. The std::list::size/splice performance. One has to be careful not
 > to use std::list's size() thoughtlessly, or write a wrapper, or
 > something. IOW, perform additional effort.
Or measure to see if the difference amounts to a hill of beans before
you go to all that work.
 > 2. Hash tables. Unlike std::set, which, in my experience, behaves
 > pretty uniformly, hash performance differs significanly. There are
 > many implementation strategies, policies involved. I prefer to drag a
 > single implementation from platform to platform, rather than rely on
 > the platform-specific implementation. Btw, last I checked, the
 > proposal for hash-tables doesn't eliminate some of the choices (but I
 > may be outdated on this), and that's, in my view, a defect.
Then submit a proposal to fix it.
 > > The innovative feature of the C++ Standard Library was the provision of
 > > certain performance guarantees.
 >
 > What I suggest is that perhaps the guarantees should be even stricter.
Be interesting to see how you'd write guarantees of linear performance
factors.
 > > >It's hard to tell which factor is best in the absolute sense, so I'd
 > > >prefer that to be a parameter. Just changing a hard-coded number
 > > >doesn't seem a good solution to me.
 > >
 > > Possibly, but hard coded values do allow certain optimisations and in
 > > another post Andy Koenig points out the a growth factor of 1.5 allows a
 > > memory usage optimisation that is not available to systems using a
 > > factor of 2.0
 >
 > It's not the specific factor that concerns me, it's my unawareness of
 > the factor when I move to another platform.
And 99 per cent of the factors that change between platforms are unimportant.
It's only when you *measure* performance and find that it fails to meet
objective specifications that you should begin to care. You'll then typically
find that the things you have to tweak are nowhere near where you first to
look.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
[With a typo correction of the factor being (1+sqrt(5))/2.]
Yes, but shouldn't we ask also the guy providing the allocator, what
kind of allocating and pooling strategy he uses?  And does the scheme
remain optimal, if we do other allocations or grow two vectors, for
instance?
I'm not arguing that 1.5 is wrong number, but I'm pondering whether
there are any right numbers.
-- 
Jouko
>  > Second, how do we multiply the size by 1.5?  Do you divide by 2 then
>  > multiply by 3, or what?
>
> Does it matter ?
>
> (i don't think you are talking about some performance issue, because the
> allocation and copy is most likely to consume more than 99% of the time
that
> the function will be running)
I meant in terms of avoiding overflow.  Multiplying by 3 first might put us
over the maximum.  Converting to double for multiplying by 1.5 could
introduce roundoff error.  I like Francis' solution in the other post.
--
+++++++++++
Siemel Naran
>Why not use a common default and use an extra integer template
>parameter that specifies the reallocation factor (in percents, for
>instance)?
I've seen *big* differences between implementations even
at other levels so if you really wanna write efficient
portable code I think the only solution is to actually
write it and analyze it on the platforms you want to
support. Sometimes the best solution for a compiler is
the worst for another; if you need to squeeze the CPU
in both I can't see anything portable you can do except
the good old #ifdef.
Even on the same compiler and on compatible hardware the
best methods may vary with the exact processor and
architecture, so I've even seen performance test about
different strategies done at runtime.
The difference may be *huge* ... I remember once I
timed a 1000% difference in execution speed by just
changing array size (with implications NOT on the
number of elements being used, but just their address).
Also I've no computer science university level education,
but what does it mean saying O(n) considering that n has
an upper limit ?
Andrea
 > In article <3faa855e$0$252$ed9e...@reading.news.pipex.net>, Stephen
 > Howe <NOSPAM...@dial.pipex.com> writes
 > > > Also, where did the sqrt(5) come from?
 > >Andrew Koenig means Phi, the Golden Ratio, the ratio of 2 successive terms
 > >that the Fibonacchi series converges to  1,2,3,5,8,13,21,34,55,...
 > Which leads me to wonder if using a Fibonacci series growth would
 > satisfy the amortised constant time requirement of the C++ Standard for
 > vector.
Yes.  The number of copies performed to reach any size is less than 4 *
size.  Linear as required to give amortized constant time.
John
Yes -- exponential growth will satisfy the anortized constant time
requirement
regardless of the exponent, and Fibonacci growth is exponential growth with
an exponent of (1+sqrt(5))/2, at least asymptotically.
 > Also I've no computer science university level education,
 > but what does it mean saying O(n) considering that n has
 > an upper limit ?
O is a measure of complexity and expresses the cost of performing a certain
operation, for example in terms of time or resource usage. O(n) means that
the cost of an operation is linear with the number of times you execute the
operation, meaning for example that if:
         n == 1, executing the operation has a total cost of 1
         n == 2, executing the operation has a total cost of 2
         n == 3, executing the operation has a total cost of 3
etcetera, while O(n^2) means:
         n == 1, executing the operation has a total cost of 1
         n == 2, executing the operation has a total cost of 4
         n == 3, executing the operation has a total cost of 9
etcetera.
-- 
Ruurd
.o.
..o
ooo
 > And it's a flaw of the Standard IMO that it doesn't mandate the
 > growing factor.
It does not do that because mandating the growing factor might have
different consequences on one platform than on another platform. Maybe not
in this case, but certainly in other cases. Plus the standard deals with
the language and its interpretation, not how compiler makers implement it.
-- 
Ruurd
.o.
..o
ooo
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
 > Here's an example.  Suppose, optimistically, that there is no overhead
 > associated with memory allocation.  Suppose also that you start with a
 > 1-element vector and grow it by a factor of two each time.
 >
 > Then you start out with 1 element.  The first time you want to grow, you
 > must allocate 2 elements, then free the 1, leaving a 1-element hole
 > followed by 2 elements.  Next time, you allocate 4, then free 2, leaving a
 > 3-element hole followed by 4 elements.  In general, after each allocation,
 > you will have an (n-1)-element hole followed by n elements.
 >
 > So the question about whether the hole is ever large enough to reuse is
 > not whether (n-1)>=n, which is almost true, but whether (n-1)>=2n, which
 > isn't close to true.
Meaning that ~1.5 is optimal if there is no in-between allocation of some
other sort, right? Tell me please, did someone benchmark this in a 'real
world' situation and what were the conclusions?
-- 
Ruurd
.o.
..o
ooo
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
That is an interesting question.  Sorting algorithms are, generally
speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
algorithm is really O(64*N), or just O(N).  In other words, a linear
algorithm can easily be worse than an NLogN algorithm as the constants
in the linear algorithm begin to creep up, and the constants in the
NLogN algorithm creep down.
joshua lehrer
factset research systems
NYSE:FDS
But doesn't this mean that sometimes you're wasting more memory,
because the hole has to be larger, so that it can contain the next
allocation?
With a factor of 1.5, before an allocation which reuses the hole, the
hole has to be more than 1.5 times the size of the currently allocated
memory. But if you're using a factor of 2 the hole will be about the
same size as the currently allocated memory.
Daniel
>That is an interesting question.  Sorting algorithms are, generally
>speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
>algorithm is really O(64*N), or just O(N).
This makes no sense.  Please show a sorting algorithm on a 'modern day
machine' that is O(n).  Where n is the number of elements in the
array.
-- 
Ed Avis <e...@membled.com>
>          n == 1, executing the operation has a total cost of 1
>          n == 2, executing the operation has a total cost of 2
>          n == 3, executing the operation has a total cost of 3
Not quite.  Saying f(n) = O(n) means that there exists a value k such that
f(n) <= k * n for all n>0.  So if we say that something costs O(n), it means
that there exists a k such that if:
    n == 1, executing the operation costs no more than k
    n == 2, executing the operation costs no more than 2*k
    n == 3, executing the operation costs no more than 3*k
and so on.
It's optimal in this one specific sense.  It is not necessarily optimal in
other senses, which may be part of the reason that some implementations use
1.5 and others use other growth factors.
> Andrea Griffini <agr...@tin.it> wrote in message news:<km4pqvgo6n3afnq2p...@4ax.com>...
>  > Also I've no computer science university level education,
>  > but what does it mean saying O(n) considering that n has
>  > an upper limit ?
> That is an interesting question.  Sorting algorithms are, generally
> speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
> algorithm is really O(64*N), or just O(N).  In other words, a linear
> algorithm can easily be worse than an NLogN algorithm as the constants
> in the linear algorithm begin to creep up, and the constants in the
> NLogN algorithm creep down.
There seems to be some confusion between the base two log of a number
and the number of bits to represent a number.  Lg(1024) is 10 regardless
of whether the 1024 is a short, long or long long.
O() describes behavior as N goes to infinity.  At and above some N, an
O(N) algorithm will be faster than an O(NlgN) algorithm.  For small N,
the O() notation may be meaningless.
To answer the original question, stating that push_back is an O(N)
operation says that the number of copies made of all objects while doing
N push_back operations is less than some constant times N regardless of
how large N gets.  It also allows larger than that number for very
small N.  Consider multiplier 2.
   N     C     C/N
   1     1     1
   2     3     1.5    The first needed copied again
   3     6     2      The first two copied again, capacity is 4
   4     7     1.75   No increase in capacity
   5    12     2.4    Capacity is 8
   8    15     1.88
   9    24     2.66   Capacity is 16
  16    31     1.94
If you continue, you will find that for powers of two, C/N approaches
but never reaches two.  For powers of two plus one, C/N approaches but
never reaches three.  Expanding by two is linear, O(N), because the
total number of copies done is never more than 3*N.
Changing the multiplier to 1.5 changes those constants to three and
four.
Consider expanding by adding 8 each time.  This is an O(N^2) algorithm
and is not allowed by the standard.
   N     C     C/N    Initial capacity is 8
   1     1     1
   8     8     1
   9    17     1.9    Capacity 16
  16    24     1.5
  17    41     2.41   Capacity 24
  24    48     2
  25    73     2.92   Capacity 32
  32    80     2.5
  33   113     3.42   Capacity 40
  40   120     3
  41   161     3.9    Capacity 48
  48   168     3.5
As you can see, C/N is increasing continuously because the algorithm
is O(N^2) not O(N).  However, it looks good for small N.
John
> ...
 > Also I've no computer science university level education,
 > but what does it mean saying O(n) considering that n has
 > an upper limit ?
> ...
This is a bit off-topic, I think, but only a bit.
I notice that the two folks replying to this statement gave some numerical
examples of O() notation to explain the concept.  I would hastily add that
when we start talking O()s actual numbers can get in the way.
When we say that O(n) is "cheaper" than O(n^2) there are always cases where
we can write a solution that, for some limited range of inputs, appears to
contradict the statement.  If we have a sorting problem, and we are going
to sort 4 values, then a bubble sort _might_ be faster than a heap sort,
even though bubbles are O(n^2) and heaps can support O(n*log(n)).  A bubble
sort has real simple setup, and a heap sort needs some preparation.  It's
not till we want to sort tens or hundreds or thousands of data that O()
starts to pan out.  And it depends on the platform, and on what kinds of
values.
O() is about the shape of the curve that gets plotted.  A line is a line in
O(), it doesn't matter how steep the line is.  These O() characteristics
tend to be noticable on a graph that you plot way to the right.  If my hard
drives keep getting larger, and my CPU keeps getting faster, and data set
keeps getting bigger, then eventually, no matter how messy my O(n*log(n))
algorithm is, it's going to be faster than any O(n^2) no matter how clean.
-- 
An HWND is a terrible thing to waste.
 > usene...@lehrerfamily.com (Joshua Lehrer) writes:
 >
 > >That is an interesting question.  Sorting algorithms are, generally
 > >speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
 > >algorithm is really O(64*N), or just O(N).
> This makes no sense.
Agreed.
 > Please show a sorting algorithm on a 'modern day
 > machine' that is O(n).  Where n is the number of elements in the
 > array.
Surely you know that radix sort is O(n).  It takes less than k*n
operations to sort n items.  It is possible that the k may be
larger than lg(n) for any reasonable n.  For sorting items with a
32 bit unsigned int key, k == 4 is common.  That is fast if the
space overhead can be tolerated.
John
No confusion here.  On a modern day machine, LogN has an upper bound
of 32 for 32-bit machines, and 64 for 64-bit machines.  So, an
O(NLogN) algorithm can easily be better than an O(N) algorithm, if the
latter has significant yet discarded constants, and the former has low
constants.
joshua lehrer
factset research systems
NYSE:FDS
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
 >Not quite.  Saying f(n) = O(n) means that there exists a value k such that
 >f(n) <= k * n for all n>0.  So if we say that something costs O(n), it means
 >that there exists a k such that if:
 >
 >    n == 1, executing the operation costs no more than k
 >    n == 2, executing the operation costs no more than 2*k
 >    n == 3, executing the operation costs no more than 3*k
 >
 >and so on.
The definition that includes arbitrary constants is the one I was
used to. But O(n) to be meaningful requires no upper limit on n...
otherwise anything is O(1) given an appropriate constant time.
Sometimes I see statements about complexity in standard algorithms
that are perfectly clear (for example an explicit upper limit on
the number of comparision or exchanges as a function of n) and
sometimes ones that are less clear (like for example requiring
"logarithmic complexity" in a world where n can't go to infinity).
What does the latter really mean ? that the *abstract* algorithm
is guaranteed to be O(log n) supposing there is no limitation
on n ? Sometimes things may get tricky... for example even
manipulating just indexes is not a fixed time operation if the
indexed area has to go to infinity...
I think formally even a list insertion can't be classified as O(1)
if for example even just a counter of nodes has to be kept updated
(but I'm not sure about this... like I said I never read a formal
explanation of O(...) theory in computer science).
Andrea
This is true once have been precised that only the biggest factor is
considered relevant...
Chris
> No confusion here.  On a modern day machine, LogN has an upper bound
> of 32 for 32-bit machines, and 64 for 64-bit machines.  So, an
> O(NLogN) algorithm can easily be better than an O(N) algorithm, if the
> latter has significant yet discarded constants, and the former has low
> constants.
No, why? Why shouldn't I be able to sort more that 2^32 elements on a
32 bit machine? I do not need to index them, or keep them in memory to
make "sorting" a valid operation. And even if I would need indices, I
can still use "arbitrary precision" numbers. Sure all that takes quite
a lot of time then to perform, but that's exactly the point... (-;
So long,
	Thomas
> >That is an interesting question.  Sorting algorithms are, generally
> >speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
> >algorithm is really O(64*N), or just O(N).
> This makes no sense.  Please show a sorting algorithm on a 'modern day
> machine' that is O(n).  Where n is the number of elements in the
> array.
I think you missed his point.  The algorithm used my be O(NLogN), but on
a modern machine, address space constraints mean that the LogN part can
never be greater than 64, or maybe only 32 (on a PC, for example) -- and
that only for the case of an array of char which occupies most of the
memory.  And if we replace LogN by this maximum limit, the algorithm
suddenly becomes O(N).
The easiest example I can think of: an std::set vs. the proposed
std::unordered_set.  Access time for std::set is O(LogN).  With a good
hash code, access time for std::unordered_set is O(1) -- constant.
However, given the finite memory of machines (and I don't know of any
exceptions), LogN will have a finite upper bound in practice -- on a PC,
it will be less than 32.  If in order to get a good hash, you use a
function which increases the constant factor by a factor of 32, std::set
will beat std::unordered_set for all possible sizes.  What is the
significance of knowing that asymtotically, the performance will be
better for std::unordered_set, if it only becomes better when the number
of elements is greater than 2^100, and you can only fit 2^20 in memory?
(Luckily for std::unordered_set, there exist some very good hashing
algorithms which are also very fast.  At least for strings and other
orders lists of integral values.)
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
> [With a typo correction of the factor being (1+sqrt(5))/2.]
> Yes, but shouldn't we ask also the guy providing the allocator, what
> kind of allocating and pooling strategy he uses?  And does the scheme
> remain optimal, if we do other allocations or grow two vectors, for
> instance?
> I'm not arguing that 1.5 is wrong number, but I'm pondering whether
> there are any right numbers.
There isn't any one "right number".  If speed is critical, copying is
expensive, and you've got plenty of memory, 3 or 4 might even be better.
If memory is tight, speed not too critical and your containers only hold
built-in types, you might prefer less than 1.5, so that the hole will be
reused earlier.  For many platforms, for built-in types and some POD's,
I imagine that using mmap or its equivalent to simply change the start
address of the data to some place where there is enough free memory
would be the fastest (sort of a realloc which will almost never copy) --
I don't know whether this would be a legal implementation, however,
since the standard says that the standard allocator calls ::operator
new, which can be user provided.
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
I think the point is that algorithms that might appear to be O(N) may really
be O(NlgN) if the amount of time necessary to increment an integer grows as
the number of bits in the integer.  That's part of the reason that
algorithmic complexity is often described in terms of abstract operations,
where adding integers <= a constant times N is considered to be one
operation.
A benchmark is a measure of performance.  Performance may 
be adequate or inadequate, but a measure of performance 
does not pass or fail; it's just a measure.  
> And it's a flaw of the Standard IMO that it doesn't mandate the
> growing factor.
I don't consider it so.  If the optimum factor differs 
significantly among platforms, why penalize some platforms 
to benefit others?  
The C++ standard in spirit is very nonrestrictive.  If 
we want more restrictive systems, we might choose Java 
or C#.
> No confusion here.  On a modern day machine, LogN has an upper bound
> of 32 for 32-bit machines, and 64 for 64-bit machines.  So, an
> O(NLogN) algorithm can easily be better than an O(N) algorithm, if the
> latter has significant yet discarded constants, and the former has low
> constants.
Still confusion.  Lg(N) is a function of N.  What you are claiming is
that a K*N*lgN algorithm is faster than a C*N algorithm if C/K is
greater than 64.  It is faster when C/K/lgN is greater than 1.  This
is often true for N < 100 but reverses rapidly.  Maybe you could give
an example where the C is large enough to make all NlgN algorithms
faster for all N < 2^64?
Maybe you could also show how this applies to vector growth?  The
exponential growth is O(N) while linear growth is O(N*N).  On modern
computers, what value of K for added amounts will make the linear
growth faster than the exponential growth?
John
> You wrote:
> > I'd be intersted if any other vector implemenations uses a growth
> > factor other than 2, and I'd also like to know whether VC7 uses 1.5
> > or 2 (since I don't have that compiler here).
> IBM's VisualAge C++ 5.0.2.2 on AIX also seems to use a growth
> factor of 1.5.
IBM's VisualAge and Microsoft VC++ both use the same library.
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
And 16 for 16 bit machines.  FWIW: a well designed shell sort is
O(N^1.2), heap sort is guaranteed O(NLogN).  On the 16 bit machines I
worked on, however, there was no possible array for which heap sort was
faster than shell sort.  (If I recall right, the break-over point was
calculated to be somewhere around 400 000 elements -- you needed that
many elements before heap sort became faster than shell sort.  And 
400 000 elements, on a 16 bit machine, just wasn't possible.)
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> > Andrea Griffini <agr...@tin.it> wrote in message
> > news:<km4pqvgo6n3afnq2p...@4ax.com>...
> >  > Also I've no computer science university level education, but
> >  > what does it mean saying O(n) considering that n has an upper
> >  > limit ?
> > That is an interesting question.  Sorting algorithms are, generally
> > speaking, O(NLogN).  LogN on modern day machines is 32 or 64, so the
> > algorithm is really O(64*N), or just O(N).  In other words, a linear
> > algorithm can easily be worse than an NLogN algorithm as the
> > constants in the linear algorithm begin to creep up, and the
> > constants in the NLogN algorithm creep down.
> There seems to be some confusion between the base two log of a number
> and the number of bits to represent a number.  Lg(1024) is 10
> regardless of whether the 1024 is a short, long or long long.
So. I don't see any confusion.
> O() describes behavior as N goes to infinity.  At and above some N, an
> O(N) algorithm will be faster than an O(NlgN) algorithm.  For small N,
> the O() notation may be meaningless.
Right.  Nobody is disagreeing what O means when you approach infinity.
In this case, however, we are applying it to containers which are in
memory.  I don't know about you, but the machine on my desktop doesn't
have an infinity of memory.  It can't even address memory beyond 2^64
bytes (and I don't have that much installed, even with virtual memory).
Which means that the number of elements in a container can never exceed
2^64, and in practice, will be much smaller.  (I've only got 1 GB swap
space, personally.)  This sets an upper limit on N, which in turn sets
an upper limit on LogN.  And while the upper limit on N is still pretty
large, the upper limit on LogN isn't really that much.
> To answer the original question, stating that push_back is an O(N)
> operation says that the number of copies made of all objects while
> doing N push_back operations is less than some constant times N
> regardless of how large N gets.  It also allows larger than that
> number for very small N.  Consider multiplier 2.
>    N     C     C/N
>    1     1     1
>    2     3     1.5    The first needed copied again
>    3     6     2      The first two copied again, capacity is 4
>    4     7     1.75   No increase in capacity
>    5    12     2.4    Capacity is 8
>    8    15     1.88
>    9    24     2.66   Capacity is 16
>   16    31     1.94
> If you continue, you will find that for powers of two, C/N approaches
> but never reaches two.  For powers of two plus one, C/N approaches but
> never reaches three.  Expanding by two is linear, O(N), because the
> total number of copies done is never more than 3*N.
On my machine, using a constant incremental expansion factor of about
200MBytes will probably outperform any multiplier for as long as the
array fits in memory.
> Changing the multiplier to 1.5 changes those constants to three and
> four.
> Consider expanding by adding 8 each time.  This is an O(N^2) algorithm
> and is not allowed by the standard.
>    N     C     C/N    Initial capacity is 8
>    1     1     1
>    8     8     1
>    9    17     1.9    Capacity 16
>   16    24     1.5
>   17    41     2.41   Capacity 24
>   24    48     2
>   25    73     2.92   Capacity 32
>   32    80     2.5
>   33   113     3.42   Capacity 40
>   40   120     3
>   41   161     3.9    Capacity 48
>   48   168     3.5
> As you can see, C/N is increasing continuously because the algorithm
> is O(N^2) not O(N).  However, it looks good for small N.
Now consider the case of adding 200000000/sizeof(T) each time.  It also
looks good for small numbers.  And before it starts looking bad, there
will be a bad_alloc anyway.
Abstractly, the basic question is: what is the significance of a measure
which is only significant as N approaches infinity if there is a strict
upper bound on N.
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
It's always nice to discover another application for the Golden Mean. 
I'm curious about one detail.  Why not use 1.6, which is closer to the
Golden Mean and to the old value?  I suppose it might make computing
the new size more expensive, but the incremental cost is insignificant
compared to the cost of allocation and copying.  (Avoiding overflow is
harder, but I suspect that many implementations don't do this now. 
For example, gcc 3.3 just uses 2 * __old_size.  Of course, anyone who
needs a vector of 2 billion elements ...)
Coincidentally, I once knew a man who organized events (e.g. banquets)
for a university.  He taught me that the old heuristic of doubling
estimates in order to produce requirements was inferior to one he'd
discovered empirically.  He multiplied by 1.6.
Felix
 > Sometimes I see statements about complexity in standard algorithms
 > that are perfectly clear (for example an explicit upper limit on
 > the number of comparision or exchanges as a function of n) and
 > sometimes ones that are less clear (like for example requiring
 > "logarithmic complexity" in a world where n can't go to infinity).
All STL complexities are limits on operations on the type T involved.
Making a copy of a vector<vector<string> > > is linear in the number
of vector<string> > regardless of string length or inner vector size.
> I think formally even a list insertion can't be classified as O(1)
A list insertion is O(1) because it makes one use of the T copy
constructor.  Any time needed to find space, build a node, link it
in, update other stuff is considered constant.  It likely depends
more on the phase of the moon than the number of items in this list
anyway.
John
 > Not quite.  Saying f(n) = O(n) means that there exists a value k such that
 > f(n) <= k * n for all n>0.  So if we say that something costs O(n), it
 > means that there exists a k such that if:
 >
 >     n == 1, executing the operation costs no more than k
 >     n == 2, executing the operation costs no more than 2*k
 >     n == 3, executing the operation costs no more than 3*k
 >
 > and so on.
Ah. So O-notation is worst-case-cost. Correct?
-- 
Ruurd
.o.
..o
ooo
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
If you're too close to the Golden Mean, it takes a lot of allocations before
the hole can be reused, especially if there is a fixed amount of additional
overhead per allocated block.
 > > O() describes behavior as N goes to infinity.  At and above some N, an
 > > O(N) algorithm will be faster than an O(NlgN) algorithm.  For small N,
 > > the O() notation may be meaningless.
 > I think the point is that algorithms that might appear to be O(N) may really
 > be O(NlgN) if the amount of time necessary to increment an integer grows as
 > the number of bits in the integer.  That's part of the reason that
 > algorithmic complexity is often described in terms of abstract operations,
 > where adding integers <= a constant times N is considered to be one
 > operation.
I agree with your analysis.  It would come into play with Fibonacci
where the linear algorithm may be quadratic.
In this case, the argument seems to be that lgN can't exceed a certain
value for any algorithm because of the size of an int.  Sorting N items
has nothing to do with the bits in N, but the bits in N does have
something to do with how many items you can try to sort using
subscripts.  The argument presented was that lgN can't get large on a
real machine.  I can't seem to recall any algorithms with NlgN behavior
other than sorting.
I think the claim was that an algorithm which seems to be NlgN is really
N because lgN has an upper bound which may be used as the constant.  I
agree that an NlgN algorithm might be faster than an N algorithm if the
N algorithm has a constant multiplier larger than 32/64 times the NlgN
algorithm constant.  I just don't know of any examples.
James' Shell sort example is a nice example where the lgN must be small
does change the expectations.  With today's large memories, that is no
longer an example.  There may be others.
John
|>  > O() describes behavior as N goes to infinity. At and above some N,
|>  > an O(N) algorithm will be faster than an O(NlgN) algorithm. For
|>  > small N, the O() notation may be meaningless.
|>  I think the point is that algorithms that might appear to be O(N)
|>  may really be O(NlgN) if the amount of time necessary to increment
|>  an integer grows as the number of bits in the integer. That's part
|>  of the reason that algorithmic complexity is often described in
|>  terms of abstract operations, where adding integers <= a constant
|>  times N is considered to be one operation.
I think the point was rather that for algorithms which operate on in
memory objects, N has a finite upper bound, such that the maximum value
ln(N) can take is either 32 or 64.  And that these numbers aren't really
so outrageously high as to make the constant factor irrelevant.
Once you drop the "in memory" consideration, of course, there are a lot
of additional variables -- the speed of arithmetic operations increases
with size, as you mention, but there are also issues of access times,
etc.: the fastest file sort, for example, will probably be the one which
manages to do the least disk accesses, even if it uses twice the
comparisons.
-- 
James Kanze                             mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
                 Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France  +33 1 41 89 80 93
> Ah. So O-notation is worst-case-cost. Correct?
Even more confusion here. "Worst case" usually means to find the
worst configuration for the data set such that the program gets "as slow"
or "as complex" as possible. This is not what is meant. It is perfectly
valid to say that "quicksort is O(n log n) in the average case and
O(n^2) in worst case". 
It is "worst case" in the sense that it is an upper limit for the complexity
given the side conditions. Thus, "quicksort is O(n log n) typically" means:
For all n, and all configurations C of size n, C(n) "typical", there is a 
constant k (independent of n and C, as long as C "typical") such that the
complexity is bound by k * n log n.
"typical" has to be specified in some way, of course.
Clearly, for all "finite" n, one can always find such k just by making
k large enough when testing all available (though finite) n. Saying
"something is of O(n log n)" thus makes only sense when you make n
large. "Something is of O(n) for limited n" means nothing. Everything is
"O(1) for finite n" so to say.
So long,
	Thomas
Asymptotic worst-case cost.  So, for example, an operation that takes time
proportional to n^2 when n < 1000000 and proportional to n^1.5 for n >=
1000000, the way to characterize it is as O(n^1.5), not O(n^2).
Surely headers are a help not a hinderance, or do mean something
different to what I'm thinking?
As I see it, if each allocation has a header of size h say
and we coalesce a pair of adjacent blocks of sizes s and t
then we have available, for recycling, a block of size s+t+h.
Pictorally...
[h:s][h:t] <-- blocks of size s and t can be recycled as...
[h:h+s+t] <-- ...a block of size h+s+t
...or did I have one too many at the Boat this lunch time?
Louis.
 >>Please show a sorting algorithm on a 'modern day machine' that is
 >>O(n).  Where n is the number of elements in the array.
 >
 >Surely you know that radix sort is O(n).  It takes less than k*n
 >operations to sort n items.  It is possible that the k may be larger
 >than lg(n) for any reasonable n.
It depends on what is 'reasonable' I guess.  Normally in O-notation
you're concerned with asymptotic complexity only, not with what's
faster for a certain range of n classed as reasonable.
Also shouldn't a sorting algorithm work given only an ordering
relation on the elements to be sorted?  Radix sort requires something
like a mapping from the element type to N, which in a way is asking
for more.  If you wanted to sort some thousand-character strings into
ASCII order, would radix sort be possible?  Yet there is certainly a
total ordering on the strings which any 'real' sorting algorithm would
be happy with,
-- 
Ed Avis <e...@membled.com>
"Andrew Koenig" <a...@acm.org> wrote in message
news:ZeUrb.45055$Ec1.3...@bgtnsc05-news.ops.worldnet.att.net...
>
> If you're too close to the Golden Mean, it takes a lot of allocations
before
> the hole can be reused, especially if there is a fixed amount of
additional
> overhead per allocated block.
Andrew's calculations seem to be based on minimizing the
growth rate, not in trying to utilize the maximum amount of
memory.  This approach starts from the wrong end of the
problem (namely from the point when the whole memory is
intact) and attempts to retain as much memory unused as
possible.  Unfortunately the low growth rate has the effect
of increasing the amount of copying as well.
I did a bit of analysis starting from the other end: from the
point when the memory is about to become exhausted.
Here's what I ended up with:
First, there is a lower limit as to how low the growth factor
can go and yet be useful: when two previous deallocations
are already large enough to contain a next allocation.  This
gives the lower limit 'x' where 'x^3 >= x+1', or 'x ~ 1.325'.
Then, there are two main cases:
1. The growth factor is between 1.325 and 1.618 .  Now
a subsequent allocation can more or less often fit into the
space vacated by previous deallocations, thereby reusing
some of the memory.
2. The growth factor is greater than 1.618 .  Now a new
block will never be able to fit into the space vacated by
previous deallocations.
In the first case, the best fit will be when the last allocation
(before memory exhaustion) snugly fits into the remaining
untouched free store, and the original vector resides at the
bottom of the free store, in the space vacated by previous
allocations.  The memory utilization will be '1 : 1+1/x'.
In the first case, the worst fit is when an allocation does not
fit into the bottom nor to the top of the free store before the
memory is exhausted.  Memory utilization will be '1 : 1+2x'.
In the second case, the best-fit will again be when the last
allocation snugly fits into the remaining free store.  This is a
bit harder calculation, but ideally the bottom gap is the sum
of the series '1/x+1/x^2+1/x^3?... = 1/(x-1)' and therefore
the memory utilization will be '1 : 1+1/(x-1)'.
In the second case, the worst-fit is when the last allocation
fails to fit in the same space .  The memory utilization will be
'1 : 1+1/(x-1)+x'.
Now some figures:
1.4 best fit 58% worst fit 26%
1.5 best fit 60% worst fit 25%
1.6 best fit 62% worst fit 24%
1.7 best fit 41% worst fit 24%
 . . .
2.0 best fit 50% worst fit 25%
 . . .
2.3 best fit 57% worst fit 25% (same worst fit with 1.5)
2.4 best fit 58% worst fit 24% (perhaps a good trade-off?)
2.5 best fit 60% worst fit 24% (same best fit with 1.5)
 . . .
2.7 best fit 63% worst fit 23%
 . . .
3.0 best fit 67% worst fit 22%
 . . .
4.0 best fit 75% worst fit 19%
 . . .
5.0 best fit 80% worst fit 16%
 . . .
etc.
So, using growth factor 2.4 gives nearly the same memory
utilization but much less copying than 1.5 .  Using a growth
factor greater than 1.618 actually has the benefit of giving a
very good memory utilization in the best fit, but the benefit
wears out in the worst fit when the factor grows past 3..4 .
The only clearly non-beneficial factors are those just above
the treshold of 1.618
Calculating a useful average fit requires using logarithmic
probability for the memory limit, so direct averaging of the
best and worst fits will unnecessarily favor one of them.  My
intuition, however, says that the ideal growth factor would
be none else than 'e = 2.718', the base of natural logarithm.
Cheers!
- Risto -
Assuming that the fixed size overhead always piggybacks
the block itself (i.e. is allocated and deallocated at the same
time), I would assume it works the other way:  The fixed
overhead of several past deallocations actually widen the
gap, whereas a single new allocation consumes the fixed
amount only once.
Cheers!
- Risto -
 >> Ah. So O-notation is worst-case-cost. Correct?
 >
 > Even more confusion here. "Worst case" usually means to find the
 > worst configuration for the data set such that the program gets "as slow"
 > or "as complex" as possible. This is not what is meant. It is perfectly
 > valid to say that "quicksort is O(n log n) in the average case and
 > O(n^2) in worst case".
Nonono, that's not what I meant here, I meant to say that (per the example
of Andrew) k is the maximum amount of cost incurred for O(n) where n == 1
when executing the algorithm.
-- 
Ruurd
.o.
..o
ooo
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> John Potter <jpo...@falcon.lhup.edu> writes:
>  >>Please show a sorting algorithm on a 'modern day machine' that is
>  >>O(n).  Where n is the number of elements in the array.
>  >Surely you know that radix sort is O(n).  It takes less than k*n
>  >operations to sort n items.  It is possible that the k may be larger
>  >than lg(n) for any reasonable n.
> It depends on what is 'reasonable' I guess.  Normally in O-notation
> you're concerned with asymptotic complexity only, not with what's
> faster for a certain range of n classed as reasonable.
For internal sorting, a reasonable N is what you can fit in memory.
That was the point in claiming that O(NlgN) is really O(32N).
> Also shouldn't a sorting algorithm work given only an ordering
> relation on the elements to be sorted?
Only if you want to restrict it to NlgN performance.  It is impossible
to do better using compares.  Shouldn't a sorting algorithm work given
only a hashing function?  Compare map and hashmap.
> Radix sort requires something
> like a mapping from the element type to N, which in a way is asking
> for more.
It needs a hashing function rather than a compare function.  It makes
multiple passes over the data hashed to some radix.  The point being
that it is possible to write an O(N) sort if you do not compare.
> If you wanted to sort some thousand-character strings into
> ASCII order, would radix sort be possible?  Yet there is certainly a
> total ordering on the strings which any 'real' sorting algorithm would
> be happy with,
The hashing function is trivial and radix sort works in O(N) the k is
1000.  Some stats to give you an idea of what happens.  The left number
is the string length, the next is size and the others are times.
                        vector           list          radix
    2
           1000              0              0             16
          10000             47             15             16
         100000            453            313            187
        1000000           6688           4906           1860
    4
           1000              0              0              0
          10000             47              0             15
         100000            531            328            359
        1000000           7031           5063           3797
    6
           1000              0              0             16
          10000             15             16             47
         100000            531            328            516
        1000000           7047           5125           5766
    8
           1000              0              0              0
          10000             47             16             62
         100000            515            344            688
        1000000           7110           5093           7610
   10
           1000              0              0              0
          10000             47             15             78
         100000            516            343            907
        1000000           7047           5110          10000
The sorts using compare are mostly unaffected by the length of
the string.  Using radix sort on 1000 character strings would
work but would also be foolish.  It is O(N) and runs on modern
machines which is what you requested.
To be fair to sort and vector, we should also do ints.
         100000             16            109            110
        1000000            234           1859           1172
       10000000           2593          29563          15687
Just for a touch of reality, a 256 byte POD with an int key.
          10000             47             16             16
         100000            484            234            125
        1000000           6172           5719           1859
With the larger sizes, you can start to see virtual memory and
cache playing a bigger role.
John
A formula that ignores some part of reality can still be useful.
If going from N = 100 to 1,000 is a performance disaster it's irrelevant
that N has an upper limit of 1,000,000,000.
Also, you usually don't know what the upper limit is. It depends on the
computer so you might as well treat N as unbounded.
At the end, if you're seriously worried about performance you treat the
big O figure as an initial guide, then do proper estimates and
measurements.
   John
-- 
John Harris
The problem I have is that those using big O forget that it is a 
theoretical measure of complexity. In the real world one of our most 
important resources is memory and that is very far from being 
homogeneous. An algorithm with a very poor 'O' metric may in fact be a 
very good performer because of localisation properties that mean that 
there is no thrashing in memory access.
I find that very few CS graduates have a real understanding of the way 
that memory caching influences performance. If performance matters best 
choices can often depend on the exact configuration of hardware and the 
sizes of both on-chip caches and main memory.
In my early days of programming maximum performance involved 
consideration of which blocks of memory you were using because access to 
more remote memory (actually physically more remote) introduced delays. 
Later when writing in Z80 assembler for a Sinclair (I think it was 
called a Timex in the US) ZX Spectrum maximum performance meant using 
the upper 32K block of memory because the lower 32K could be interrupted 
for such things as screen updates.
The big O is fine for cases where other things are equal but in the real 
world that is rarely the case and programmers need to be aware of the 
real world factors. A program that may fly on a high end development 
machine may be virtually unusable on the target machine, whilst a poorer 
performer on the development machine may have perfectly adequate 
performance on the target. Do not just measure before optimising but 
measure on the target.
-- 
Francis Glassborow      ACCU
If you are not using up-to-date virus protection you should not be reading
this. Viruses do not just hurt the infected but the whole community.
>For internal sorting, a reasonable N is what you can fit in memory.
>That was the point in claiming that O(NlgN) is really O(32N).
Point taken - although again, O(32N) is 'really' O(32 * 2^32) which is
O(1).  The whole O-notation starts to look a bit silly when you
restrict to a certain maximum value of N.  It might be better to use
some other notation for considering real-world performance on
'reasonable' input data, since it's implicit in O-notation that there
is no upper limit on N.
>>Also shouldn't a sorting algorithm work given only an ordering
>>relation on the elements to be sorted?
>
>Only if you want to restrict it to NlgN performance.  It is
>impossible to do better using compares.
Yes this is what I meant.  I was thinking of a 'sorting algorithm' as
something that takes some elements and a <= relation on them, and
produces a sorted list.  Of course there are other ways of sorting the
data using some function which gives you more help than <=.  It may
seem odd that I claim that an algorithm which sorts is not necessarily
a sorting algorithm, but that's what I meant.  Perhaps I should use
some other phrase like 'classical sorting algorithm'.
>Shouldn't a sorting algorithm work given only a hashing function?
If your hashing function really is just a hash, then you cannot use it
on its own for sorting.  For example you cannot sort strings using
only their MD5 sums, because comparing md5(a) and md5(b) tells you
nothing about whether a <= b.  On the other hand if your hashing
function does give you ordering as well, then you've effectively got
<= plus some bonus information.
As far as I know a <= relation is the least possible 'help' required
by any sorting algorithm.  You can't implement one with less, and a
'classical sorting algorithm' does not require anything more.
Radix sort does require something more.
-- 
Ed Avis <e...@membled.com>
> John Potter <jpo...@falcon.lhup.edu> writes:
 > >For internal sorting, a reasonable N is what you can fit in memory.
 > >That was the point in claiming that O(NlgN) is really O(32N).
 > Point taken - although again, O(32N) is 'really' O(32 * 2^32) which is
 > O(1).  The whole O-notation starts to look a bit silly when you
 > restrict to a certain maximum value of N.  It might be better to use
 > some other notation for considering real-world performance on
 > 'reasonable' input data, since it's implicit in O-notation that there
 > is no upper limit on N.
We agree that claiming an O(NlgN) is O(N) is nonsense.  It has to do
with the relationship between increased time and increased size.  The
original claim that all NlgN algorithms are N algorithms on today's
machines is nonsense.  I think the point was that an NlgN algorithm
may be faster than an N algorithm for useful values of N.  That is
of course always possible and has nothing to do with O notation.
A point that could be made on the subject of this thread is that a
nonconforming O(N^2) growth algorithm could perform better than a
conforming O(N) growth algorithm.  Consider size += 1000 and
size = 1.000001 * size + 1.  The former is quadradic and will just
waste space for most real uses.  The latter is linear and will act
quadradic for most real uses.  The O requirements in the library are
worth nothing in court.  The specific requirements like no more than
N calls of the copy ctor are enforcable.  One must take the O
requirements as guidelines for uses of reasonable implementations
not mandates to implementors.
 > >>Also shouldn't a sorting algorithm work given only an ordering
 > >>relation on the elements to be sorted?
 > >Only if you want to restrict it to NlgN performance.  It is
 > >impossible to do better using compares.
> Yes this is what I meant.
I didn't realize that you were claiming that by definition all sorting
algorithms are O(NlgN) or slower.  Makes your request for an O(N)
algorithm seem a bit silly.
 > I was thinking of a 'sorting algorithm' as
 > something that takes some elements and a <= relation on them, and
 > produces a sorted list.  Of course there are other ways of sorting the
 > data using some function which gives you more help than <=.
You of course know that std::sort will crash and burn if given a <=
function.  That is not a strict weak order.  Something like < which is
more is required.
 > It may
 > seem odd that I claim that an algorithm which sorts is not necessarily
 > a sorting algorithm, but that's what I meant.  Perhaps I should use
 > some other phrase like 'classical sorting algorithm'.
I don't think that will work either.  Radix sort is older than any smart
algorithm except merge sort.  You should use the phrase algorithms which
sort by comparing keys.
> >Shouldn't a sorting algorithm work given only a hashing function?
 > If your hashing function really is just a hash, then you cannot use it
 > on its own for sorting.  For example you cannot sort strings using
 > only their MD5 sums, because comparing md5(a) and md5(b) tells you
 > nothing about whether a <= b.  On the other hand if your hashing
 > function does give you ordering as well, then you've effectively got
 > <= plus some bonus information.
The hashing function is effectively operator[] which gives more than
one hash none of which determines operator< but in combination they
do.  A rather inefficient operator< which produces a very efficient
sorting algorithm for small limits on the range of subscripts.
You may continue to claim that radix sort is not a sorting algorithm,
but I will use it to sort rapidly when the data fits the algorithm.
It sorts and it is O(N) whatever you call it.
John
My intuition votes for fibonacci, if only because it is the growth engine
of nature. Fibonacci growth has some properties that seem related to
effiecient memory allocation and deallocation.
As I understand from others, it is important to have fixed sized blocks when
dealing with memory. Growth 2 has a unit measure of 1, growth 1.5 has 0.5
as unit measure. Fibonacci seems to lack a clear unit, as it is something like
0.618... Hard to measure in fixed units. But it is easy.
I lack mathematics enough to avoid methematical terms. Some numbers must
do the talking. I start out with a vector of 1000 elements and have it grow
fibonacci-wise 5 times:
1000
1618
2618
4236
6854
11090
You can express these numbers like this:
int a=1000;
int b=618;
a;                //1000
a+b;            //1618
2*a+b;        //2618
3*a+2*b;    //4236
5*a+3*b;    //6854
8*a+5*b;    //11090
In words, you have 2 'units', a bigger and a smaller. Growth can be
expressed in terms of bigger and smaller. The fibonacci sequence itself
re-occurs in the calculation of the actual size. That is as somewhat as
expected but also reassuring. When 2 is too big and 1.5 is too small,
1.6180339... seems just right.
-X
|>  In article <d6652001.0311...@posting.google.com>, 
|>  ka...@gabi-soft.fr writes
|>     <snip>
|>  >Abstractly, the basic question is: what is the significance of a
|>  >measure which is only significant as N approaches infinity if there
|>  >is a strict upper bound on N.
|> A formula that ignores some part of reality can still be useful.
|>  If going from N = 100 to 1,000 is a performance disaster it's
|>  irrelevant that N has an upper limit of 1,000,000,000.
Certainly.  The discussion concerned the difference between O(n) and 
O(n ln n).  The important point is that ln n grows extremely slowly.
Nobody is arguing that the ^2 is irrelevant in O(n^2).
|>  Also, you usually don't know what the upper limit is. It depends on
|>  the computer so you might as well treat N as unbounded.
More or less.  For in memory algorithms, there is a real upper bound,
determined by the size of a pointer.  I don't know any modern machine
where this is greater than 64 bits, and I don't expect that to change
anytime soon.
|>  At the end, if you're seriously worried about performance you treat
|>  the big O figure as an initial guide, then do proper estimates and
|>  measurements.
Completely agreed.  The big O figure is an initial guide.  The whole
point is that the ln n part might not be significant for real data.
Only the ln n part, however -- the break even point between O(n) and
O(n^2) is rarely more than around a hundred elements (although one can
doubtlessly construct cases...)
The problem is where the break even point can be fairly large.  I
already posted an example where the break even point of real algorithms
was around 100000 elements: shell sort (O(n^1.2) and quick sort (typical
O(n ln n) -- O(n ln n) is asymtotically better than O(n^1.2), but if
your array can never hold more than 100000 elements, you're better off
with shell sort).
-- 
James Kanze                             mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France  +33 1 41 89 80 93
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
>>>>Also shouldn't a sorting algorithm work given only an ordering
>>>>relation on the elements to be sorted?
>
>>>Only if you want to restrict it to NlgN performance.  It is
>>>impossible to do better using compares.
>
>>Yes this is what I meant.
>
>I didn't realize that you were claiming that by definition all
>sorting algorithms are O(NlgN) or slower.
Well not by definition: my definition just says 'take some elements
and a <= relation, and sort them'.  Somebody has proven that
O(n log n) number of comparisons is the best you can do under these
circumstances, although I don't know what the proof is.
>Makes your request for an O(N) algorithm seem a bit silly.
I just wondered what another poster was talking about and asked for an
example (of what I thought was impossible) to clarify things.
>You of course know that std::sort will crash and burn if given a <=
>function.  That is not a strict weak order.  Something like < which
>is more is required.
OK - I believe that most sorting algorithms which can be implemented
in terms of <= can also be done with < and vice versa.
>Radix sort is older than any smart algorithm except merge sort.  You
>should use the phrase algorithms which sort by comparing keys.
OK.
-- 
Ed Avis <e...@membled.com>
Maciej Sobczak <no....@no.smap.com> wrote:
> Interesting, but it would not be able to meet the "amortized constant
> cost of unit grow" requirement.
Yes, it could. The Fibonacci sequence approaches exp(phi, n) in the
limit (where phi is the golden ratio), which is sufficient to meet the
amortized constant growth.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd
Hm.  I had a Spectrum but I did not know that. :-(  Or I just do not
remember.
> The big O is fine for cases where other things are equal but in the
> real world that is rarely the case and programmers need to be aware
> of the real world factors. A program that may fly on a high end
> development machine may be virtually unusable on the target machine,
> whilst a poorer performer on the development machine may have
> perfectly adequate performance on the target. Do not just measure
> before optimising but measure on the target.
Half OT, full OT?  I had an old college in Hungary.  He was one of those 2,
who have made virtually bug free programs.  Of course how he has made it was
testing and rollback scenarios, but there is one point which matters here.
He has refused to get a new, highly powerful workstation.  He was working on
the slowest possible workstation all the time.  I mean as long as the
development environment did allow that, after that he had two: one for the
bloated development tools and one, the slowest possible one in the company
for testing.  Possibly even slower as the ones we have shipped with the
product.  The reason?  If the program was able to run nice on that machine,
it did run nice at any customer too.  The slow hard disk "emulated" slow
network environment and so on.
Once he called me into his office and told me that he is very happy he did
not give away his old computer.  On the development workstation his program
ran very fast, like under 10 minutes, on non-trivial size of live test data.
On the test PC it run for 8 hours!  Difference?  Memory for disk cache
revieled a programming error.  The workstation was able to cache the whole
database index in memory while appending to the database.  The old PC had no
disk cache, or incredibly small, so as the larger and larger indices under
creation became fragmented the number of seeking required to do the task
increased and with the smallest test example the code was running for more
than 8 hours.  Removing and not opening the indices made the process a
little bit faster on the development workstation and many times faster on
the old PC.  It has finished within 30 minutes.  Not only that.  The indices
created "cleanly" proved to provide faster access later - I guess the engine
was able to use a better algorithm when having the whole file at once.
I think the moral of that story is more true now than ever.  At that time
the systems were simple.  Very simple, compared to any everyday system
nowadays.  The number of unknown factors and "dynamic" behaviour has
increased dramaticly.  On a system today there are many processes running,
daemons, drivers, system processes and so forth.  Any non-trivial system is
closer to chaos (unpredictable in many ways) than to order.  To test such
systems weaknesses one has to put them under enormous stress.  It is kinda
like the saying: hope for the best, but be prepared for the worst.
Do you recall that little icon in the 16bot Windows SDK?  I do not know if
it is still there today...  The elephant, the icon of the stress
application.  I have tried at that time with a well known product (suite)
and it seemed that not everyone uses it where it was made...
With the higher complexity and more unpredictable behaviour of todays
systems it is much easier to create an application which cannot stand
stress.  And when that happen data loss or worse is the result, which is not
good for anyone.
So I completely agree with Francis.  And I wish that the designers of some
tools I have to use during my work... but that is really off topic. :-)
--
Attila aka WW
> >A formula that ignores some part of reality can still be useful.
> >If going from N = 100 to 1,000 is a performance disaster it's
> >irrelevant that N has an upper limit of 1,000,000,000.
> >Also, you usually don't know what the upper limit is. It depends on
> >the computer so you might as well treat N as unbounded.
> >At the end, if you're seriously worried about performance you treat
> >the big O figure as an initial guide, then do proper estimates and
> >measurements.
> The problem I have is that those using big O forget that it is a
> theoretical measure of complexity.  In the real world one of our most
> important resources is memory and that is very far from being
> homogeneous.  An algorithm with a very poor 'O' metric may in fact be
> a very good performer because of localisation properties that mean
> that there is no thrashing in memory access.
Yes and no.  There is no doubt that things like locality can greatly
affect performance.  There is also no doubt in my mind that we generally
have very little control over it -- I can remember one specific case
where adding or removing a variable in main made a difference (by a
factor of 3 or 4, IIRC) in the execution time of a function called
before the variable was in scope.
On the other hand, it is important to realize that the claim here only
concerns the difference between O(n) and O(n ln n), and is made because
ln n will always be relatively small for in memory algorithms.  There is
no general claim, and the difference between O(n) and O(n^2) will be
significant, regardless of the locality.  Going from 10 to 1000
elements multiplies the factor by 100 in one case, and by 10000 in the
other, and it's going to take a lot of bad locality to offset that kind
of difference.
> I find that very few CS graduates have a real understanding of the way
> that memory caching influences performance.  If performance matters
> best choices can often depend on the exact configuration of hardware
> and the sizes of both on-chip caches and main memory.
> In my early days of programming maximum performance involved
> consideration of which blocks of memory you were using because access
> to more remote memory (actually physically more remote) introduced
> delays.  Later when writing in Z80 assembler for a Sinclair (I think
> it was called a Timex in the US) ZX Spectrum maximum performance meant
> using the upper 32K block of memory because the lower 32K could be
> interrupted for such things as screen updates.
I remember being told that when the mainframes moved to using virtual
memory, instead of explicitly managed overlays, performance when down.
Because when you had to load an overlay explicity, you took pains to do
it as little as possible.  I've also encountered specific cases where
reading a file blockwise into a buffer, and processing each block, was
faster than using mmap.
Convenience is not without cost.
> The big O is fine for cases where other things are equal but in the
> real world that is rarely the case and programmers need to be aware of
> the real world factors.
The point of big O, generally, is that the difference rapidly becomes so
large that all other things are irrelevant.  The relevance of big O
depends on just how rapidly.  The argument is that the change isn't very
rapid when comparing O(n) and O(n ln n).  Big O is still very relevant
as soon as there is an integral power difference or more: O(n)
vs. O(n^2), for example, or O(n) vs O(2^n).  Quicksort still beats
bubble sort.
> A program that may fly on a high end development machine may be
> virtually unusable on the target machine, whilst a poorer performer on
> the development machine may have perfectly adequate performance on the
> target.  Do not just measure before optimising but measure on the
> target.
Agreed there.  And with realistic data sets; just because it works with
two users doesn't mean that it will perform correctly with 20 on a
machine 10 times as fast.
--
James Kanze           GABI Software        mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/     http://www.gabi-soft.fr
                    Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
>Yes and no.  There is no doubt that things like locality can greatly
>affect performance.  There is also no doubt in my mind that we generally
>have very little control over it -- I can remember one specific case
>where adding or removing a variable in main made a difference (by a
>factor of 3 or 4, IIRC) in the execution time of a function called
>before the variable was in scope.
I remember one case in which changing the size of the
arrays being processed affected the performance by a
factor of 10 (note... just the size of the array, not
the number of elements processed; in other words the
changing part was just the memory address of the elements).
The details of caching schemes of modern processors is
getting more complex every day and this part of
programming is leaving math to join physics.
Things are now so complex that just experimenting
instead of trying to do exact planning seems the only
viable solution. Things also are both complex and very
processor specific with important differences even
between binary-compatible processors.
>On the other hand, it is important to realize that the claim here only
>concerns the difference between O(n) and O(n ln n), and is made because
>ln n will always be relatively small for in memory algorithms.
My original doubt on what O(...) means was about something
else, i.e. what is the exact formal definition of O(...)
given that n is bounded in C++. From the answers I've read
in this thread looks like O(...) is indeed just a fuzzy
requirement. I'm *NOT* saying the order of the algorithms
isn't important... IMO thinking to if the solution is "O(n)"
or "O(n^2)" is very very important in programming (even if
I don't know what *formally* means O(n^2) with bounded n).
I also add that I log(n) parts are indeed almost forgot
as constant cost ones in my programs. I tend to just
forget even amortized log(n) (I mean mostly log(n) and
rarely n*log(n)).
I'm not so sure if having performance requirements for
the standard library stated in the standard is really
that important; is there any C library in which qsort uses
bubble sort ? I've also seen implementers just ignoring
the standard in places where the standard was mandating
an unjustifiably inefficient implementation.
Andrea
 > I'm not so sure if having performance requirements for
 > the standard library stated in the standard is really
 > that important; is there any C library in which qsort uses
 > bubble sort ? I've also seen implementers just ignoring
 > the standard in places where the standard was mandating
 > an unjustifiably inefficient implementation.
I don't think so [;^)], since the standard doesn't do that.
Performance requirements in the standard just set the minimum; a
vendor is always allowed to provide a more-efficient implementation.
-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
>Andrea Griffini <agr...@tin.it> writes:
>
> > I'm not so sure if having performance requirements for
> > the standard library stated in the standard is really
> > that important; is there any C library in which qsort uses
> > bubble sort ? I've also seen implementers just ignoring
> > the standard in places where the standard was mandating
> > an unjustifiably inefficient implementation.
>
>I don't think so [;^)], since the standard doesn't do that.
>Performance requirements in the standard just set the minimum; a
>vendor is always allowed to provide a more-efficient implementation.
Of course the inefficient implementation was not directly
requested but just a consequence of inefficient code used
as specification. The problem I'm talking about was on
many unneeded temporary copies that the sample code however
(surely unintentionally) required... there were implementers
simply ignoring that bad implementation and using a better
one even if not equivalent to the code provided in the
standard. Try to count the number of temporaries required
by the original specification of std::map<...>::operator[];
even before this issue got fixed I think no implementer
used code equivalent to that.
Andrea
 > On 23 Nov 2003 09:49:26 -0500, David Abrahams
 > <da...@boost-consulting.com> wrote:
 >
 >>Andrea Griffini <agr...@tin.it> writes:
 >>
 >> > I'm not so sure if having performance requirements for
 >> > the standard library stated in the standard is really
 >> > that important; is there any C library in which qsort uses
 >> > bubble sort ? I've also seen implementers just ignoring
 >> > the standard in places where the standard was mandating
^^^^^^^^^
 >> > an unjustifiably inefficient implementation.
 >>
 >>I don't think so [;^)], since the standard doesn't do that.
 >>Performance requirements in the standard just set the minimum; a
 >>vendor is always allowed to provide a more-efficient implementation.
 >
 > Of course the inefficient implementation was not directly
 > requested but just a consequence of inefficient code used
 > as specification.
"Mandating" means more than "directly requested".  It means that the
"unjustifiably inefficient implementation" was required.
-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
>"Mandating" means more than "directly requested".  It means that the 
>"unjustifiably inefficient implementation" was required.
And indeed it was. From the last draft (that I was told was equal on
this point to the first released document)
23.3.1.2 map element access [lib.map.access]
T& operator[](const key_type& x);
  Returns:
    (*((insert(make_pair(x, T()))).first)).second.
The "returns" described this way doesn't leave much freedom and there
are a lot of pointless copies to be made to be compliant with that
description. 
No, the "golden rule" can't do much here... (or anywhere else, for that
matter): if you can tell the difference then the compiler can't do it,
and constructors can be instrumented so removing copies that that an
expression *requires* is forbidden.
Later the text has been changed to fix the problem, but I found no
implementation using "make_pair"
(it would build a pair of the wrong type, so requiring another
pair-to-pair conversion) even before the fix was in place.
In at least one case the implementer choice was of not using "insert" at
all as that is unavoidably less efficient (currently the text doesn't
place restrictions about how operator[] is implemented).
Now, IMO correctly, std::map<...>::operator[] is allowed to be executed
without the creation of any temporary and without any copy of key or
value if the key is already present in the map; but there were
implementers doing that BEFORE it was legal.
Andrea