STL Slow - VS2005

WXS

unread,

Aug 17, 2006, 2:29:35 PM8/17/06

to

VS2005
We compared the MS STL with stlport5.0 and found the results so
significantly slow as to make us question the implementation. Here is an
output of our tester compiled under vs2005 comparing vs2005's stl with
stlport 5.0's stl implementation.

Anyone else seeing the same or know how to improve performance? We tried
enabling small block allocation and setting the security checks define to 0
but no luck, performance got even worse.

=====

W:\test\string_tests\VC8\Release>string_tests.exe
Append one character to empty string 100 million times: 6585 ms
Search long string for 3 different substrings 10 million times: 3272 ms
Assignment, search, replace 10 million times: 3537 ms
Pass and return by value function called 10 million times: 8288 ms
Pass by reference return by value function called 10 million times: 4186 ms
Short String(10 chars) - Pass and return by value function called 10 million
times: 1464 ms

VS2005/STLPORT5.0

===============

W:\test\string_tests\VC8\Release>string_tests.exe
Append one character to empty string 100 million times: 1841 ms
Search long string for 3 different substrings 10 million times: 3091 ms
Assignment, search, replace 10 million times: 2664 ms
Pass and return by value function called 10 million times: 3576 ms
Pass by reference return by value function called 10 million times: 1827 ms
Short String(10 chars) - Pass and return by value function called 10 million
times: 820 ms

Carl Daniel [VC++ MVP]

unread,

Aug 17, 2006, 5:24:03 PM8/17/06

to

"WXS" <W...@discussions.microsoft.com> wrote in message
news:B3C45289-3843-4465...@microsoft.com...

> VS2005
> We compared the MS STL with stlport5.0 and found the results so
> significantly slow as to make us question the implementation. Here is an
> output of our tester compiled under vs2005 comparing vs2005's stl with
> stlport 5.0's stl implementation.
>
> Anyone else seeing the same or know how to improve performance? We tried
> enabling small block allocation and setting the security checks define to
> 0
> but no luck, performance got even worse.

Without knowing exactly how you built the code, and exactly what the code is
doing, it's impossible to comment beyond the superficial.

Can you post your test program for others to inspect and verify your
results?

-cd

WXS

unread,

Aug 17, 2006, 9:09:02 PM8/17/06

to

Here is the file that was tested with vs2005 stl and vs2005 w/stlport 5.0

// string_compare.cpp : Defines the entry point for the console application.
//
#include <cstdio>
#include <string>

// This DEFINE is needed for VC6.
// IN VC8 namespace std is used
#ifdef USE_STLPORT
#pragma message ("USING STLPORT")
using namespace stlport;
#else
using namespace std;
#endif

string func(string par )
{
string tmp(par);
return tmp ;
}

string func2(string& par )
{
string tmp(par);
return tmp ;
}

int main ( int , char* const* )
{
HighResPerformanceMeasurement usecTimer;

{
usecTimer.Start();
string s ;
for ( int i = 0 ; i < 100000000; ++ i ) {
s += " " ;
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Append one character to empty string 100 million times: %.0f
ms\n", msecs);
}

{
usecTimer.Start();
string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
JKHWJ:CND");
for (int i = 0 ; i < 10000000; ++i) {
s.find( "unfkHBUKGY" ) ;
s.find( "W^( @T@H!B" ) ;
s.find( "J:CND" ) ;
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Search long string for 3 different substrings 10 million times:
%.0f ms\n", msecs);
}

{
usecTimer.Start();
string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
JKHWJ:CND");
string::size_type p ;
string ss1( "unfkHBUKGY");
string ss2 ( "123456" ) ;
string sx ;
for ( int i = 0 ; i < 10000000; ++ i ) {
sx = s ;
p = sx.find ( ss1 ) ;
sx.replace ( p , ss1.size() , ss2 ) ;
sx += s ;
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Assignment, search, replace 10 million times: %.0f ms\n", msecs);
}

{
usecTimer.Start();
string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
JKHWJ:CND");
for( int i = 0 ; i < 10000000; ++ i ) {
string sx = func( s );
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Pass and return by value function called 10 million times: %.0f
ms\n", msecs);
}

{
usecTimer.Start();
string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
JKHWJ:CND");
for( int i = 0 ; i < 10000000; ++ i ) {
string sx = func2( s );
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Pass by reference return by value function called 10 million
times: %.0f ms\n", msecs);
}

{
usecTimer.Start();
string s("1234567890");
for( int i = 0 ; i < 10000000; ++ i ) {
string sx = func( s );
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());
printf("Short String - Pass and return by value function called 10 million
times: %.0f ms\n", msecs);
}

return 0 ;

peter.ko...@gmail.com

unread,

Aug 18, 2006, 7:53:00 AM8/18/06

to

WXS wrote:
> Here is the file that was tested with vs2005 stl and vs2005 w/stlport 5.0
>
>

[snip code]

But the interesting stuff - how you compiled your code - is left out.
Debug builds are far slower than release builds, of course. And the
Microsoft std library does have a lot of additional functionality in
debug mode. This costs execution time but saves lots of time debugging.

/Peter

Ulrich Eckhardt

unread,

Aug 18, 2006, 9:45:54 AM8/18/06

to

peter.ko...@gmail.com wrote:
> WXS wrote:
>> Here is the file that was tested with vs2005 stl and vs2005 w/stlport 5.0
>>
>>
> [snip code]
>
> But the interesting stuff - how you compiled your code - is left out.
> Debug builds are far slower than release builds, of course. And the
> Microsoft std library does have a lot of additional functionality in
> debug mode. This costs execution time but saves lots of time debugging.
>

In his initial post he wrote:

| W:\test\string_tests\VC8\Release>string_tests.exe

[...test output...]

Note the 'Release' in the path; I'm guessing the default release
compilation.

Uli

WXS

unread,

Aug 18, 2006, 9:54:01 AM8/18/06

to

Create a release mode console app with the wizard, was the basis for the
test. Then we varied with setting the small block heap and the security
check off which just made things worse.

I'll see if I can get the project file used exactly.. But try it yourself
under whatever settings you like for either it won't matter (in terms of
relative performance it is still significantly slower than freeware
opensource both running in release mode).

Bronek Kozicki

unread,

Aug 18, 2006, 3:21:27 PM8/18/06

to

WXS wrote:
> Create a release mode console app with the wizard, was the basis for the
> test. Then we varied with setting the small block heap and the security
> check off which just made things worse.

for extra performance in Release mode use option /D_SECURE_SCL=0 (and
/D_SCL_SECURE_NO_DEPRECATE for nicer compiler output). See also
http://msdn2.microsoft.com/en-us/library/ms404587.aspx

B.

Jerry Coffin

unread,

Aug 18, 2006, 11:03:33 PM8/18/06

to

In article <F15505AE-1D5A-4DAF...@microsoft.com>,
W...@discussions.microsoft.com says...

> Here is the file that was tested with vs2005 stl and vs2005 w/stlport 5.0

The first and most crucial question to ask of any benchmark is how
accurately this benchmark represents real-world workloads. Unless the
benchmark is quite close to real-world usage, it's meaningless -- and
even if it's fairly close, it can still be entirely meaningless (if the
differences between "fairly close" and "real world" make a big
difference in performance).

[ ... ]

> string s ;
> for ( int i = 0 ; i < 100000000; ++ i ) {
> s += " " ;

People have done quite a few studies on string lengths as they're
actually used. Every time I've seen such a study, it's concluded that
most strings are quite short -- most people find that over 90% of
strings are 80 characters or less, and the frequency of usage drops of
quickly with length -- even 1000 characters is extremely rare. A string
of 100 megabytes is so rare that your measurement means nothing about
real use.

[ ... ]

> string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> JKHWJ:CND");
> for (int i = 0 ; i < 10000000; ++i) {
> s.find( "unfkHBUKGY" ) ;
> s.find( "W^( @T@H!B" ) ;
> s.find( "J:CND" ) ;

Likewise, in the real world, most strings consist primarily (if not
exclusively) of a-z, A-Z, space, and a FEW punctuation characters. I
haven't checked, but my immediate guess is that a significant difference
in string searching speed likely results from using a Boyer-Moore (or
BMH) search algorithm. Restricting the alphabet in use also
significantly reduces the advantage gained from such an algorithm.

That's not to say that such an advanced algorithm can't be useful -- but
my immediate guess is that the strings chosen above significantly
exaggerate the advantage likely to be gained in most real use.

> string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> JKHWJ:CND");

[ ... ]

> printf("Assignment, search, replace 10 million times: %.0f ms\n", msecs);

The comments about searching apply here as well.

[ ... ]

> string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> JKHWJ:CND");

[ ... ]

> printf("Pass and return by value function called 10 million times: %.0f
> ms\n", msecs);

I can't imagine anybody passing strings by value except by accident.
Virtually every book on C++ teaches even rank beginners to pass class-
objects by ref-to-const by about chapter 2. This benchmark seems to
apply only to thoroughly inept code.

> string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> JKHWJ:CND");

[ ... ]

> printf("Pass by reference return by value function called 10 million
> times: %.0f ms\n", msecs);

The first comments on string length apply here as well, though obviously
to a somewhat lesser degree -- this string is somewhat longer than
usual, but at least not nearly so outrageously so.

> string s("1234567890");
[ ... ]

> printf("Short String - Pass and return by value function called 10 million
> times: %.0f ms\n", msecs);

Passing by value again? If I didn't know better, I'd guess you carefully
avoided the two (by far) most common situations: 1) pass a short string
by reference and return a short string by value, 2) pass a short string
by reference, modify a string whose reference was passed.

Of course, the second of these doesn't really measure much about the
string implementation at all -- and if the compiler includes named
and/or anonymous return value optimization, the first probably doesn't
either. To make a long story short, this entire line of testing only
means something to 1) people who have no clue of how to use C++, or 2)
are stuck using old compilers without any return value optimization
capability.

Given STLPort's target of supporting nearly all compilers, the latter is
probably meaningful for the library. Unless I'm mistaken, VS 2005 does
include RVO, so it means little in this context though.

Ultimately, I can't see anything here that seems to indicate much of
anything about whether (for example) I'd get better or worse performance
in my programs if I used STLPort's string class instead of Dinkum's. It
might happen, but these tests just don't show anything meaningful one
way or the other.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Bo Persson

unread,

Aug 19, 2006, 6:38:03 AM8/19/06

to

"Jerry Coffin" <jco...@taeus.com> skrev i meddelandet
news:MPG.1f502d18e...@news.sunsite.dk...

> In article <F15505AE-1D5A-4DAF...@microsoft.com>,
> W...@discussions.microsoft.com says...
>> Here is the file that was tested with vs2005 stl and vs2005
>> w/stlport 5.0
>
> The first and most crucial question to ask of any benchmark is how
> accurately this benchmark represents real-world workloads. Unless
> the
> benchmark is quite close to real-world usage, it's meaningless --
> and
> even if it's fairly close, it can still be entirely meaningless (if
> the
> differences between "fairly close" and "real world" make a big
> difference in performance).
>
> [ ... ]
>
>> string s ;
>> for ( int i = 0 ; i < 100000000; ++ i ) {
>> s += " " ;

This one is especially funny, as a small change into

s += ' ';

makes it run *a lot* faster.

What realistic application is doing a significant number of append
with single character string literals?

Bo Persson

Jerry Coffin

unread,

Aug 19, 2006, 8:54:15 PM8/19/06

to

In article <4ko804F...@individual.net>, b...@gmb.dk says...

[ ... ]

> >> for ( int i = 0 ; i < 100000000; ++ i ) {
> >> s += " " ;
>
> This one is especially funny, as a small change into
>
> s += ' ';
>
> makes it run *a lot* faster.

Yup.

> What realistic application is doing a significant number of append
> with single character string literals?

I can't say I remember have read or written too many of those...

WXS

unread,

Aug 23, 2006, 3:44:01 PM8/23/06

to

We tried this, made it even slower.

WXS

unread,

Aug 23, 2006, 3:53:02 PM8/23/06

to

Ok. I'll be honest I got this from another developer in house that ran these
tests and only glanced at the test scenario, that at quick glance looked
reasonable.

I think that your assessment is reasonable and we should conduct a more real
world performance test with an existing applications code and then if we
discover particular issues break out those scenarios for a performance tester.

Thanks,
Dave

WXS

unread,

Aug 23, 2006, 3:59:02 PM8/23/06

to

Some of our code internally does do a lot of string parsing and tag creation,
but you are right our normal code probably would not do a single character.

As Jerry pointed out as well some more real world scenarios are probably a
better way to test.

It is still a little surprising the differences between stlport and VS STL
in terms of these particular test scenarios, it is possible these do map to
more real world scenarios, but until we verify we don't really know.

Certainly this does test a specific test case where stlport is faster than
vs stl, but you are right the test cases might not be close enough to a real
world scenario to ensure the results would apply. We'll do some more real
world testing to see if we can see these numbers map to our real world app.

WXS

unread,

Aug 23, 2006, 4:30:02 PM8/23/06

to

Got this update from a test run with changes:

See last test in each, short string.

VS2005
==================================================================

W:\test\string_tests\VC8\Release>string_tests.exe

Append one character to empty string 100 million times: 6653 ms
Search long string for 3 different substrings 10 million times: 3361 ms
Assignment, search, replace 10 million times: 3560 ms
Pass and return by value function called 10 million times: 8317 ms
Pass by reference return by value function called 10 million times: 4177 ms

Short String - Pass and return by value function called 10 million times:

1465 ms
Short String - Pass by reference return by value function called 10 million
times: 743 ms

VS2005 & STLPORT 5
Append one character to empty string 100 million times: 1759 ms
Search long string for 3 different substrings 10 million times: 3114 ms
Assignment, search, replace 10 million times: 2667 ms
Pass and return by value function called 10 million times: 3597 ms
Pass by reference return by value function called 10 million times: 1816 ms

Short String - Pass and return by value function called 10 million times:

822 ms
Short String - Pass by reference return by value function called 10 million
times: 427 ms

string func2(string& par )
{
string tmp(par);
return tmp ;
}

{
usecTimer.Start();
string s("1234567890");

for( int i = 0 ; i < 10000000; ++ i ) {
string sx = func2( s );
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());

printf("Short String - Pass by reference return by value function called 10

million times: %.0f ms\n", msecs);
}

Compiler options:
/O2 /I "W:\3rd Party Software\STLport\stlport" /I "v:\include" /D "WIN32" /D
"NDEBUG" /D "_CONSOLE" /FD /EHsc /MD /Fo"VC8\Release\\"
/Fd"VC8\Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt

Linker Options:
/OUT:"VC8\Release\string_tests.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"W:\3rd
Party Software\STLport\stlport" /MANIFEST
/MANIFESTFILE:"VC8\Release\string_tests.exe.intermediate.manifest" /DEBUG
/PDB:"w:\test\string_tests\VC8\Release\string_tests.pdb" /SUBSYSTEM:CONSOLE
/OPT:REF /OPT:ICF /MACHINE:X86 /ERRORREPORT:PROMPT stlport.5.0.lib
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib
shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

WXS

unread,

Aug 23, 2006, 4:31:02 PM8/23/06

to

I do have to see we are having trouble finding any test scenario that runs
significantly on par.

-ephi

unread,

Aug 23, 2006, 4:58:02 PM8/23/06

to

Hi,
I'm the developer who created these tests.

"Jerry Coffin" wrote:

> The first and most crucial question to ask of any benchmark is how
> accurately this benchmark represents real-world workloads. Unless the
> benchmark is quite close to real-world usage, it's meaningless -- and
> even if it's fairly close, it can still be entirely meaningless (if the
> differences between "fairly close" and "real world" make a big
> difference in performance).

Yes, but I am microbenchmarking a small subset of the STL string
implementation only. Most real world apps will have bigger bottlenecks to
deal with and these benchmarks will not matter much. The benchmarks do point
to a more efficient implementation in STLPort.

> > string s ;
> > for ( int i = 0 ; i < 100000000; ++ i ) {
> > s += " " ;
>
> People have done quite a few studies on string lengths as they're
> actually used. Every time I've seen such a study, it's concluded that
> most strings are quite short -- most people find that over 90% of
> strings are 80 characters or less, and the frequency of usage drops of
> quickly with length -- even 1000 characters is extremely rare. A string
> of 100 megabytes is so rare that your measurement means nothing about
> real use.
>

Test was meant to benchmark the performance when the string must do new
allocations. This was an easy way to get at that.

>
> > string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> > JKHWJ:CND");
> > for (int i = 0 ; i < 10000000; ++i) {
> > s.find( "unfkHBUKGY" ) ;
> > s.find( "W^( @T@H!B" ) ;
> > s.find( "J:CND" ) ;
>
> Likewise, in the real world, most strings consist primarily (if not
> exclusively) of a-z, A-Z, space, and a FEW punctuation characters. I
> haven't checked, but my immediate guess is that a significant difference
> in string searching speed likely results from using a Boyer-Moore (or
> BMH) search algorithm. Restricting the alphabet in use also
> significantly reduces the advantage gained from such an algorithm.
>
> That's not to say that such an advanced algorithm can't be useful -- but
> my immediate guess is that the strings chosen above significantly
> exaggerate the advantage likely to be gained in most real use.
>

Whatever. STLPort searched faster. Try it with other strings and see what
happens.

> > string s("qyweyuewunfkHBUKGYUGL ,wehbYGUW^( @T@H!BALWD:h^&@#*@(#:
> > JKHWJ:CND");
> [ ... ]
> > printf("Pass and return by value function called 10 million times: %.0f
> > ms\n", msecs);
>
> I can't imagine anybody passing strings by value except by accident.
> Virtually every book on C++ teaches even rank beginners to pass class-
> objects by ref-to-const by about chapter 2. This benchmark seems to
> apply only to thoroughly inept code.
>

Test was meant to benchmark the copy constructor, and return value
optimization. This was an easy way to test that.

> > string s("1234567890");
> [ ... ]
> > printf("Short String - Pass and return by value function called 10 million
> > times: %.0f ms\n", msecs);
>
> Passing by value again?

Again, to test the copy constructor on short strings.

> If I didn't know better, I'd guess you carefully
> avoided the two (by far) most common situations: 1) pass a short string
> by reference and return a short string by value, 2) pass a short string
> by reference, modify a string whose reference was passed.

OK. Here goes with these two tests included (code at end):

VS2005
==================================================================

W:\test\string_tests\VC8\Release>string_tests.exe

Append one character to empty string 100 million times: 6791 ms
Search long string for 3 different substrings 10 million times: 3367 ms
Assignment, search, replace 10 million times: 3551 ms
Pass and return by value function called 10 million times: 8362 ms
Pass by reference return by value function called 10 million times: 4222 ms

Short String - Pass and return by value function called 10 million times:

1460 ms
Short String - Pass by reference return by value function called 10 million
times: 741 ms
Short String - Pass by reference modify reference called 10 million times:
643 ms

VS2005 & STLPORT
===================================================================

W:\test\string_tests\VC8\Release>string_tests.exe

Append one character to empty string 100 million times: 1804 ms
Search long string for 3 different substrings 10 million times: 3134 ms
Assignment, search, replace 10 million times: 2611 ms
Pass and return by value function called 10 million times: 3839 ms
Pass by reference return by value function called 10 million times: 1885 ms

Short String - Pass and return by value function called 10 million times:

829 ms
Short String - Pass by reference return by value function called 10 million
times: 421 ms
Short String - Pass by reference modify reference called 10 million times:
245 ms

{
usecTimer.Start();
string s("1234567890");

for( int i = 0 ; i < 10000000; ++ i ) {
string sx = func2( s );
}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());

printf("Short String - Pass by reference return by value function called 10

million times: %.0f ms\n", msecs);
}

void func3(string& par)
{
par = "1234123412";
}

{
usecTimer.Start();
string s("1234567890");
for( int i = 0 ; i < 10000000; ++ i ) {
func3( s );

}
usecTimer.End();
double msecs =
double((usecTimer.GetDifference()*1000)/usecTimer.GetFrequency());

printf("Short String - Pass by reference modify reference called 10 million

times: %.0f ms\n", msecs);
}

-ephi

Bo Persson

unread,

Aug 23, 2006, 5:27:14 PM8/23/06

to

"-ephi" <-ep...@discussions.microsoft.com> skrev i meddelandet
news:4232B407-5C77-4AFA...@microsoft.com...

> Hi,
> I'm the developer who created these tests.
>
> "Jerry Coffin" wrote:
>
>> The first and most crucial question to ask of any benchmark is how
>> accurately this benchmark represents real-world workloads. Unless
>> the
>> benchmark is quite close to real-world usage, it's meaningless --
>> and
>> even if it's fairly close, it can still be entirely meaningless (if
>> the
>> differences between "fairly close" and "real world" make a big
>> difference in performance).
>
> Yes, but I am microbenchmarking a small subset of the STL string
> implementation only. Most real world apps will have bigger
> bottlenecks to
> deal with and these benchmarks will not matter much. The benchmarks
> do point
> to a more efficient implementation in STLPort.
>
>
>> > string s ;
>> > for ( int i = 0 ; i < 100000000; ++ i ) {
>> > s += " " ;
>>

> Test was meant to benchmark the performance when the string must do

> new
> allocations. This was an easy way to get at that.
>

But in practice it unfortunately doesn't measure that. The operator +=
probably maps to

append(String, traits_type::length(String));

In my implementation I got about a 10 times speedup by adding the
special case

basic_string& append(const value_type* _String, size_type _StringSize)
{
if (_StringSize == 1)
push_back(*_String);
else
...
}

So, you are measuring one *very* specific optimization of the standard
library implementation. Adding a single space character could be

s += " ";
s += ' ';
s.push_back(' ');
s.append(" ");
s.append(1, ' ');

You are just measuring the first one.

Bo Persson

-ephi

unread,

Aug 24, 2006, 9:23:02 AM8/24/06

to

"Bo Persson" wrote:
[...]

> >> > string s ;
> >> > for ( int i = 0 ; i < 100000000; ++ i ) {
> >> > s += " " ;
> >>
>
> > Test was meant to benchmark the performance when the string must do
> > new
> > allocations. This was an easy way to get at that.
> >
>
> But in practice it unfortunately doesn't measure that. The operator +=
> probably maps to
>
> append(String, traits_type::length(String));
>
> In my implementation I got about a 10 times speedup by adding the
> special case
>
> basic_string& append(const value_type* _String, size_type _StringSize)
> {
> if (_StringSize == 1)
> push_back(*_String);
> else
> ...
> }

Again. I'm trying to factor out the append optimizations and just see which
implementation is better at the allocations.
I'm assuming all the append logic time will be minimal compared to the
number of allocations that need to be done after 100 million appends.

> So, you are measuring one *very* specific optimization of the standard
> library implementation. Adding a single space character could be
>
> s += " ";
> s += ' ';
> s.push_back(' ');
> s.append(" ");
> s.append(1, ' ');
>
> You are just measuring the first one.

OK. Here are the other tests (the push_back is especially scary):
For what it's worth I am not trying to dis VS2005 STL. It is probably more
compliant than STLPort but these tests clearly show a less efficient
implementation compared to STLPort. And for most projects this probably
doesn't matter that much (as larger bottlenecks usually exist). I'm looking
for suggestions, project property options or anything else I can use so I am
not forced into using STLPort.

VS2005

======
W:\test\string_tests\VC8\Release>string_tests.exe

Append one character ( +=" " ) to empty string 100 million times: 6663 ms
Append one character ( += ' ' ) to empty string 100 million times: 3156 ms
Append one character ( push_back(' ') ) to empty string 100 million times:
1497 ms
Append one character ( append(" ") ) to empty string 100 million times: 6610
ms
Append one character ( append(1,' ')to empty string 100 million times: 2870 ms

VS2005 w/STLPort

==============
W:\test\string_tests\VC8\Release>string_tests.exe

Append one character ( +=" " ) to empty string 100 million times: 2076 ms
Append one character ( += ' ' ) to empty string 100 million times: 1496 ms
Append one character ( push_back(' ') ) to empty string 100 million times:
1065 ms
Append one character ( append(" ") ) to empty string 100 million times: 2009
ms
Append one character ( append(1,' ')to empty string 100 million times: 2390 ms

Short String - Pass by reference return by value function called 10 million
time

s: 420 ms

Short String - Pass by reference modify reference called 10 million times:

239 m
s

Fomitchev@discussions.microsoft.com Boris Fomitchev

unread,

Aug 24, 2006, 12:13:02 PM8/24/06

to

You may find much more details on different STL string performance
comparison here : http://complement.sourceforge.net/compare.pdf

Stephen Howe

unread,

Aug 24, 2006, 3:21:21 PM8/24/06

to

> Yes, but I am microbenchmarking a small subset of the STL string
> implementation only. Most real world apps will have bigger bottlenecks to
> deal with and these benchmarks will not matter much. The benchmarks do
point
> to a more efficient implementation in STLPort.

Do they really?
Imagine you are implementing list.
What will you do in the allocator?

1) Will you "remember" previous nodes on deallocation?
2) Will you free previous nodes on deallocation?

1) and 2) are mutually exclusive.
1) will give fast node reuse but heap memory use will increase
2) will give slower node reuse but heap memory use will be minimal as it is
recycled.

Your tests are too simplistic.
They don't measure how well implemented the library designers have been over
their STL.
Instead they measure the default choice of allocators that comes
out-of-the-box with compiler.
I am certain that accounts for your performance difference.
Your seeing the usual space-versus-time tradeoffs that we all know.

Now I know that Dinkumware sells some add-on package that gives you various
different drop-in replacement allocators. And similarly STLPort also has
alternative allocators if it matters that much.

There is no right-or-wrong here. I think you will find that STLPort has a
bigger footprint in terms of memory consumed - nothing wrong with that, it
is the library choices made. If it was a 16-bit enviroment or embedded ROM
enviroment that might be the wrong choice.

If there is any fault, I blame the C++ standard because apart from limited
control over vector, deque and string; it is impossible for the programmer
to inform the library designers what their intention is with a container. If
your application uses large amounts of heap memory between several
containers, enough to bust OS limits , you don't want any container to
reserve spare memory for later use. And contrarywise if you are constantly
reusing a container which grows and shrinks, for performance reasons you
want it to recycle the same memory again and again.

I would consider the default allocator out-of-the-box as part of tests but
it would be the only thing I would look at.
I would _ALSO_ like to make some tests that are allocator-neutral and also
don depend on the choice of growth factor for vector, string. You have not
done that.

And I can beat STLPort in terms of your tests with my STL implementation
I would make the growth rates of vector, string, deque, stringstreams be 64
times whenever capacity is reached.
The small-string-optimisation for string - I would ensure there is space for
1K right form the outset.
So at a bare minimum sizeof(std::string) would be 1024. The strings would be
obese.
But the cost of this astronomic. The memory footprint is huge.
By your criterion, your tests are likely to reveal that my STL
implementation is the most "efficient" out.
But it would be a poor conclusion drawn from oversimple tests.

Stephen Howe

unread,

Aug 24, 2006, 4:04:39 PM8/24/06

to

> but these tests clearly show a less efficient implementation compared to
STLPort.

They don't "clearly" show.
See my other post referencing allocators and container growth rates which is
what you _ARE_ testing.
Did you check memory footprints halfway through? No? Perhaps it does not
matter for your application.

Here are some examples which show something on efficiency.

Try std::sort() with many duplicates and an instrumented ctor, copy ctor,
assignment operator and dtor.
I think you will find Dinkumware comes out on top because last time I looked
STLPort does not implement fatpivot. Dinkumware does.

And it is possible to feed some nasty distribution to Dinkumware/STLPort
where std::nth_element() is not O(N) because neither implementation switches
to an alternative algorithm if quickselect goes bad.

And the latest STLPort's std::stable_partition() handles a few more cases
where auxilary memory needs no allocation.

Both use a form of heapsort for std::partial_sort() which is what the STL
designers had in mind but AFAIK, a variation of quicksort can be used which
is more efficient.

All that I have seen, by inspection, proves to me that both implementations
are written extremely well.
But you are not measuring that.

> Whatever. STLPort searched faster. Try it with other strings and see what
> happens.

I will have a look to see how find is implemented and then feed worse case
to STLPort.
There is nothing magical about std::string.find() - library designer usually
has to make what they think is a good _general_ choice. Some have a startup
cost but afterwards run fast. simple-first character search then memcmp,
Rabin-Karp, Knuth-Morris-Pratt, Boyer-Moore... they have good and bad cases.

Stephen Howe

unread,

Aug 24, 2006, 6:56:11 PM8/24/06

to

>I do have to see we are having trouble finding any test scenario that runs
> significantly on par.

Yes, right. And if you read the PDF file that Boris provided the link to,
when you get to section 6 you can see what role the allocator plays.

Stephen Howe