I am stumped on this. I am sure that I am somehow causing a managed
C++ build but can not figure out how.
Here is the deal. I am doing some micro-benchmarking just to get a
feel for the speed of C++ vs. C# vs. Java. My expectation was that C++
would be the fastest which (using VC++2005) was not true which lead me
to where I am today.
For the (silly) code (pasted at the end), here are the representative
run times (C++ only):
VC++2005: 6.61 seconds
VC++6: 3.19 seconds
g++(cygwin): 3.18 seconds
All were built using release builds. I have built the VC++2005
executable both from the command line (cl /O2 greedy.cpp and cl /O2
/EHsc greedy.cpp, to stop the warnings) and using win32 release mode in
the IDE. In all cases I get a dead slow executable.
Am I missing something here?
Thanks!
Patrick
#include <vector>
#include <algorithm>
#include <iostream>
////////////////////////////////////////////////////////////////////////////////
struct RowColValue
{
int row;
int col;
int value;
RowColValue(int r, int c, int v)
: row(r), col(c), value(v) {}
bool operator <(RowColValue const & right) const
{
return value < right.value;
}
};
////////////////////////////////////////////////////////////////////////////////
class AssignmentTool
{
public:
void setCosts(int** costs, int dim)
{
d_dim = dim;
d_rcv.clear();
d_rcv.reserve(dim*dim);
for(int r=0; r<dim; ++r)
{
for(int c=0; c<dim; ++c)
{
d_rcv.push_back( RowColValue(r, c, costs[r][c]) );
}
}
std::sort(d_rcv.begin(), d_rcv.end());
d_used_rows.resize(dim);
d_used_cols.resize(dim);
}
void solve( std::vector<int> & solution )
{
solution.resize( d_dim );
std::fill( d_used_rows.begin(), d_used_rows.end(), 0 );
std::fill( d_used_cols.begin(), d_used_cols.end(), 0 );
int i = 0;
for (int n = 0; n < d_dim; ++n)
{
int r = -1;
int c = -1;
while(true)
{
r = d_rcv[i].row;
c = d_rcv[i].col;
++i;
if (d_used_rows[r] == 0 && d_used_cols[c] == 0)
{
d_used_rows[r] = 1;
d_used_cols[c] = 1;
break;
}
}
solution[c] = r;
}
}
private:
int d_dim;
std::vector<int> d_used_rows;
std::vector<int> d_used_cols;
std::vector<RowColValue> d_rcv;
};
////////////////////////////////////////////////////////////////////////////////
#include <windows.h>
#include <winbase.h>
////////////////////////////////////////////////////////////////////////////////
class StopWatch
{
public:
void start()
{
LARGE_INTEGER t;
QueryPerformanceCounter( &t );
d_start = t.QuadPart;
}
void stop()
{
LARGE_INTEGER t;
QueryPerformanceCounter( &t );
d_stop = t.QuadPart;
}
double elapsed() const
{
double diff = double(d_stop - d_start);
return diff/ticsPerSecond();
}
private:
__int64 ticsPerSecond() const
{
LARGE_INTEGER freq;
QueryPerformanceFrequency( &freq );
return freq.QuadPart;
}
__int64 d_start;
__int64 d_stop;
};
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char ** argv)
{
int N = 100;
int* memory = new int[N*N];
int** costs = new int*[N];
int * p = memory;
for(int k=0; k<N; ++k)
{
costs[k] = p;
p+=N;
}
for (int r=0; r<N; ++r)
{
for (int c=0; c<N; ++c)
{
costs[r][c] = (r + 1) * (c + 1);
}
}
AssignmentTool tool;
tool.setCosts( costs, N );
std::vector<int> solution;
StopWatch watch;
std::cout << "start" << std::endl;
watch.start();
int count=0;
for(int w=0; w<50000; ++w)
{
tool.solve(solution);
count += solution[0];
}
watch.stop();
std::cout << "stop" << std::endl;
for(int i=0; i<N; ++i)
{
std::cout << solution[i] << " ";
}
std::cout << "\n";
std::cout << "count= " << count << "\n";
std::cout << "time = " << watch.elapsed() << "\n";
delete [] memory;
delete [] costs;
return 0;
}
////////////////////////////////////////////////////////////////////////////////
You've got a benchmarks that makes artificially high usage of the standard
library, so your benchmark is very sensitive to the speed of the standard
library - so much that it will completely mask any effect due to differences
in code generation.
The standard library for VC++ 2005 has some anti-buggingg features (known as
the "Secure SCL") that are probably slowing you down. Try adding a
#define _SECURE_SCL 0
to the top of your file to disable Secure SCL. For a more typical program,
the overhead imposed by the secure SCL features have a minimal impact on
speed, but in the case of a standard library benchmark (which is what you
have here), it can make a huge difference.
Another thing that's changed since VC6 is the use of the small block heap.
That can lead to significant performance differences - either positive or
negative, depending on the allocation pattern of your program. You can use:
_set_sbh_threshold(1016)
to get the VC6 behavior.
I would expect that between those two, you'll get a result from VC++ 2005
that more closely matches what you're seeing from VC6. When you compare
apples to apples (i.e. compile the same code with both compilers), VC++ 2005
consistently produces code that's as good as or better than VC6. In this
case, most of the code you're compiling (the standard library) is not the
same between your different test environments.
-cd
Thanks very much! This did indeed make a big difference:
VC++2005: 3.58 seconds (using Carl's suggestions)
VC++6: 3.19 seconds
g++(cygwin): 3.18 seconds
It is still slower than VC++6 and g++(cygwin) but it is a lot closer.
I really question microsoft's decision on this one. One of the main
themes of C++ is that "you don't pay for what you don't use" and one of
the goals of STL was that it have as little over head as possible
(that's why there is no bounds checking on the indices of vectors, for
instance). Having a "debug mode" for the standard library is extremely
valuable but leaving (even some) checking on _by default_ in release
mode (and only being able to turn those off by non-standard means) is
highly questionable. Especially when it makes a C++ program
substantially slower than a nearly identical C# and Java programs
(which force you to always leave the checking on).
Not blaming you of course :-)
-Patrick
Believe me, you're not the first to question the logic :)
On the other hand, consider that computers are 1000x faster than they were
just a few years ago - in most cases, far faster than required for the task
at hand (and no, games don't constitute "most cases" - they're a very
special breed with their own special needs). Programmers who've "not paid
for" what they "didn't use" have managed to write 1000's of buffer overrun
bugs and other vulnerabilities, so maybe it IS time to change the defaults
to have the "seatbelts" on unless you specifically decide to turn them off.
Another thing that changed in VC 2005 - there's no longer a single threaded
CRT. For artificial benchmarks like this, that too can lead to surprising
results. If you're not doing so already, try building your VC6 and G++ code
using the multi-threaded runtimes.
Also, the VC++ standard library is far more standard conformant than VC6 (I
don't know about G++ since there are so many versions with such a wide
variety of conformance that it's impossible to speculate). I can't say for
sure, but it's possible that correctness fixes in the VC++ 2005 standard
library also have an adverse effect on your benchmark.
-cd
That is all quite reasonable and, if the extra checking could be done
for little cost, I might agree. Unfortunately, the overhead can be
quite large. The truely surprising part is that it makes the C++
version of the algorithm much slower than the C# or Java versions of
the algorithm (which probably have as much or more checking on all the
time). C++ (like C) is a powerful but dangerous language. If you
don't need the extra speed and power it offers (and many applications
don't), it might make sense to use a language with fewer "sharp edges"
(i.e. Java or C#). I don't think using a "managed" language makes one
a less "manly" programmer... its all about using the right tool for the
job at hand.
I work in an application domain where CPU cycles still count
(simulation of complex systems). It is true that computers keep
getting faster. The flip side is that the systems we simulate keep
getting more complicated (they use those faster computers too...) and
the fidelity that is expected continues to increase. I need the trade
off of speed vs. safety that C++ is supposed to provide.
-Patrick
You might also try the Intel C++ compiler. It bolts directly into the
Visual Studio IDE and can produce tighter and faster code.
Eric
I suggest you try using STLPort library, rather than the version that
ships with VC++ 2005.
Any arguments to back up this statement?
It's not a statement. It's a suggestion. There is usually
no need to back up any suggestions. There is no need to argue,
one should simply try and compare. After that there will be
results that can be presented, and then we can speculate on
the causes for those results. With statements, arguments to
back them up, et cetera.
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
I admit that my post should be more elaborate. So, the long
version of the question is: Why original poster should try
_using_ STLport library instead of version that ships with
VC++ 2005?
The above mentioned suggestion was phrased in a way, which
implies that in order to get decent results, default library
(that ships with VS) should be ditched in favor of STLport
library. That's why I asked to provide some arguments that
will support this implication.
It's all in your head. :-) "Try using" was most likely prefered
over "try reading about" or "try looking at".
So, what about "rather than" part? :) Probably, I'm way too
anal, too. After all, there must be a reason why I'm hanging
around here in programming groups instead of getting a
life...
It's possible to have both. I do.
I have a wife and a lovely 1.5 year old daughter. I am also renovating parts
of my house (i.e. I don't have 2 left hands).
So, how do I find the time hang out here, write articles and maintain a
blog?
Simple: C++ is my hobby.
Oh, wait...
Luckily, my wife has lots of patience, and she has accepted me as the geek I
am.
--
Kind regards,
Bruno van Dooren
bruno_nos_pa...@hotmail.com
Remove only "_nos_pam"
and you've forgotten the drivers :)
--PA
Well, Victor was kind enough to answer it already :)
However, I must add that I know at least one case when a developer was
complaining that his (STL-heavy) applications slowed down significantly
(3 times if I remember well) after porting it from VC6 to VC 2005. When
he used STLPort, the speed was back to normal.
Of course, this does not imply that STLPort is *always* faster than
Dinkumware, just that *in some cases* it may be :)
Regards,
--PA
<pla...@alumni.caltech.edu> wrote in message news:1159448086.3...@d34g2000cwd.googlegroups.com...
Nor does it imply that STLPort is any better. Faster is
not always better if it achieves speed by losing
correctness.
FWIW I haven't noticed any problems with MSVC7.1 and
dinkumware. Code performs as well as Borland's BCB 6
with STLPort except that dinkumware is less problematic
when it comes to standard compliance. (Not sure which
version of STLPort BCB was using though)
It's already been noted that MSVC8
adds some debugging that can be disabled. Did this
help the OP?
The problem with original poster's benchmark (as with any
other artificial test) is that we don't know to what extent
this test reflects real life workloads. I purposely won't
bring the argument whether additional checks worth potential
slowdown (although, in my opinion, they mostly worth it). I
do believe that there may exist an application, which relies
heavily on STL algorithms performance. For such sort of
applications STL does have means to tune performance.
However, before any attempt is made to use these means,
developer should ask himself whether the application he
writes does require any speed up eforts. Most of the times
it doesn't.
The suggestion to switch to other STL than default one (with
all the cost that results from that) just because of
artificial benchmark speed test doesn't sound solid enough
for me. There was good discussion about this very topic in
microsoft.public.vc.stl recently. See "STL Slow - VS2005"
thread started on 17th of August.
"STL Slow - VS2005"
http://groups.google.com/group/microsoft.public.vc.stl/browse_frm/thread/a37b93427276bc26/6d454bf5ebcf2829#6d454bf5ebcf2829
Alex
I agree. The case I mentioned, though, wasn't a benchmark, but a
real-life application.
> I purposely won't
> bring the argument whether additional checks worth potential
> slowdown (although, in my opinion, they mostly worth it).
In my opinion, they are not. We are using C++ not because it is "safe",
but because it is fast and flexible. When I am ready to pay the
"safety" toll in performance, I simply use a language that was designed
with that mindset.
> The suggestion to switch to other STL than default one (with
> all the cost that results from that) just because of
> artificial benchmark speed test doesn't sound solid enough
> for me.
In general I agree, but OP is obviously a CS student and arguments like
these are not applicable. Switching between several STL implementations
is a good learning exercise, IMHO.
Now you are implying that using STLPort leads to incorrect code, which
is not true, AFAIK. A library can be fast *and* correct.
>
> FWIW I haven't noticed any problems with MSVC7.1 and
> dinkumware.
Neither have I. Here, we are talking about MSVC8.
First of all, this principle for language itself wasn't
changed. We still get "what we paid for" and only that with
C++ compiler. Second, the Standard Library performs now more
checks by default because surrounding world changes and
imposes new requirements on software development. I agree
with Carl Daniel on that issue, see his comments above in
this thread. Doing another couple of checks is negligible
"waste" of resources for modern computer. However, the ROI
of these checks is high.
The higher importance of stable and secure software is
universally recognized nowadays. That's why library vendors
incorporate more checks. These improvements are rather
accepted by developing community than rejected. This happens
due to simple fact that price of software failure becomes to
be higher than presence of "redundant" check. Otherwise
industry wouldn't accept this changes in libraries.
Moreover, you can turn checks off at any time, so if you
don't want to pay you're not required to.
>> The suggestion to switch to other STL than default one
>> (with
>> all the cost that results from that) just because of
>> artificial benchmark speed test doesn't sound solid
>> enough
>> for me.
>
> In general I agree, but OP is obviously a CS student and
> arguments like
> these are not applicable. Switching between several STL
> implementations
> is a good learning exercise, IMHO.
Agreed.
Alex
Ah yes, the drivers. My other vice ;-)
FIrst of all we need to distinguish between libraries and applications.
For many real life applications, Carl's reasoning makes a lot of sense,
and btw that's the reason many application developers are moving to
"safer" programming languages these days.
Libraries in general, and the Standard Library in particular are quite
a different color, though. They need to be both fast and correct.
*Another couple of checks* that result in operations being slower by a
magnitude is not acceptable - at least not in Release mode.
I totally agree with that. That's why I made clear
distinction between the language itself (which didn't change
in that respect) and accompanying library (which is more
flexible in accommodating to changing reality).
> Libraries in general, and the Standard Library in
> particular are quite
> a different color, though. They need to be both fast and
> correct.
> *Another couple of checks* that result in operations being
> slower by a
> magnitude is not acceptable - at least not in Release
> mode.
The fact that the most widely used Standard Library
implementation decided to enable checked iterators by
default speaks for itself. It appears that for overwhelming
majority of customers these checks are either desirable or
satisfactory enough to keep them enabled. Such decisions
aren't made out of mischievous spirit but reflect industry's
demand. You compare current iterator performance to old one
as if it's some kind of standard. However, it is not. Such
argument is akin to oppositon to C++ expressed by some C die
hards because virtual function call "holds back performance"
and implicit constructors calls put "coder out of control",
etc.. The same kind arguments was made by ASM die hards to C
supporters.
In the same manner as most coders willingly pay cost of
virtual calls in order to get richer language, they
willingly sacrifice a couple of CPU cycles to obtain more
stable secure code. Even though iterators' speed was
degraded in order of magnitude, the time that average
application spends within iterator's code is nothing
comparing to other parts. That's why overall performance is
hardly stricken while overall robustness of a code improved
greatly.
> Now you are implying that using STLPort leads to incorrect code, which
> is not true, AFAIK. A library can be fast *and* correct.
I don't mean to imply anything. I'm just suggesting that switching
libraries may bring unsuspected surprises in other areas.
>>
>> FWIW I haven't noticed any problems with MSVC7.1 and
>> dinkumware.
>
> Neither have I. Here, we are talking about MSVC8.
Right. So what's the difference? AFAICT it's the added iterator
debugging which can be disabled.
I am comparing the current STL implementation to the competing ones,
and I find the later being faster without losing any functionality. I
am working on a soft real-time system, which is on the server side
implemented on Linux, and if glibc++ had such a drop in performance
betwen versions, we would have replaced it with STLPort or some other
implementation - as simple as that. On the client side, where we use
VC++, speed is not as critical so we can live with the new library, but
the fact remains that we lose performance and gain nothing in return.
> Such argument is akin to oppositon to C++ expressed by some C die
> hards because virtual function call "holds back performance"
Don't know about C die hards, but the fact is that virtual functions
are not slower than equivalent C code (basically a switch - case
construct on a type) and they give clear benefits. Slowing down a
library without a real benefit worries me, though.
Debugging is OK in Debug mode, not in Release mode, IMHO.
So logically there must be a reason - security.
> Don't know about C die hards, but the fact is that virtual functions
> are not slower than equivalent C code (basically a switch - case
> construct on a type) and they give clear benefits. Slowing down a
> library without a real benefit worries me, though.
And what is the problem about simply defining a symbol which disables
the "release debug iterators". AFAIK not all debugging features are
active in release builds.
Andre
Agreed. So add the define that turns off iterator
debugging in your release build. Or am I missing
something?
You gain more stable software. You gain improved integrity
of standard containers. That is my point precisely. Most
application are ready to pay the price of iterator's
performance slow down in order to get more security and
stability during runtime. Real-time systems have very
special features that majority of other applications don't
require. Standard Library provides generic solution, after
all. It is not real-time framework. For generic applications
iterator checking is good and welcome. However, even for
performance oriented software there is possibility to
disable iterators checks.
If competing implementation doesn't have this feature, then
it is a disadvantage. Instead of boasting about this fact,
these implementations should keep silence and meanwhile
develop this feature ASAP to be competitive again. Checking
iterators consistency, buffer lengths, stacks, etc. is a
good thing. If compiler/library cannot provide such checks
then it simply doesn't keep pace with time.
Well, there was (probably) a simpler solution to the problem than
switching to STLport.
> Of course, this does not imply that STLPort is *always* faster than
> Dinkumware, just that *in some cases* it may be :)
In this case, I get 4.47 seconds with STLport and 4.38 seconds with
VC8's default library (with #define _SECURE_SCL 0)...
Tom
The times only measure the (repeated) execution of the algorithm. See
the code in the original post.
-Patrick