Numerics: Visual C++ vs. g++

Evgenii Rudnyi

unread,

May 22, 2008, 5:16:04 AM5/22/08

to

I have been working for long time with g++ but for some internal
matter I have to use Visual C++ Express Edition now. I have made quick
tests with my code in C and C++ that implements the naive
multiplication of matrices

http://matrixprogramming.com/MatrixMultiply/code/2direct/

with the goal to see how the compiler optimizes the loops. My commands
to compile and run tests are in the make file compare and

make –f compare

compiles and runs tests with GCC and Visual C++. Below there are
results with gcc 3.3 under Cygwin and Visual C++ Express Edition 2005
at my HP notebook

$ make -f compare

gcc -s -O3 -Wl,--stack=50000000 direct1.c -o direct1-gcc.exe
direct1-gcc.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 2.453000 s

cl -O2 -nologo direct1.c -link -STACK:50000000 -out:direct1-vc.exe
direct1.c
direct1-vc.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 1.984000 s

gcc -s -O3 -Wl,--stack=50000000 direct2.c -o direct2-gcc.exe
direct2-gcc.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 2.047000 s

cl -O2 -nologo direct2.c -link -STACK:50000000 -out:direct2-vc.exe
direct2.c
direct2-vc.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 1.985000 s

g++ -s -O3 direct.cc -o direct-gcc.exe
direct-gcc.exe 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 2 s

cl -EHsc -O2 -nologo -DUSECLOCK direct.cc -link -out:direct-vc.exe
direct.cc
direct-vc.exe 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 6.969 s

One sees that for the C code VC produces a slightly faster code but in
the case of C++ code it is slower more than 3 times. I am new to VC++
and I guess that there are some specific flags to optimize the C++
code. I am searching now in Help but so far unsuccessfully. I would
appreciate any hint in this respect, as it is quite painful to loose a
factor of 3 in a simple loop.

Best wishes,

Evgenii

user923005

unread,

May 22, 2008, 12:08:46 PM5/22/08

to

The C++ version is performing vector allocations. It is not the same
as your other versions which put the data on the stack.

P.S.
If you make them static arrays, you won't need such an awful stack.
Since the size is not dynamic, static arrays make sense here (unless
you want to compare other sizes in which case you should use malloc()
for C).

My G++ performance is not like yours. Here are my timings on 2.2 GHz
AMD running Windows 2003 (32 bit OS):

Your makefile, but CXXFLAG = -s -O3 -DUSECLOCK:
C:\math\matmul>direct-cc.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 9.609 s

Microsoft Visual C++ with flags:
/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"USECLOCK" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Zp16 /GS- /
arch:SSE /Fo"Release\\" /Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /
Zi /TP /errorReport:prompt
C:\math\matmul>direct-noprof.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 7.391 s

As above, with profile guided optimization:
C:\math\matmul>direct-profile.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 6.891 s

These use your makefile without changes, but I used gfortran and not
g77:
C:\math\matmul>direct1-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.125000 s

C:\math\matmul>direct1-f.exe
time for C( 1000 , 1000 ) = A( 1000 ,
1000 ) B( 1000 , 1000 ) is 12.640625 s

C:\math\matmul>direct2-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.172000 s

C:\math\matmul>direct2-f.exe
time for C( 1000 , 1000 ) = A( 1000 ,
1000 ) B( 1000 , 1000 ) is 5.1406250 s

Here I re-ran the fortran tests with g95:
g95 -s -O3 direct1.f -o direct1-f.exe
direct1-f.exe
time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
13.140625 s
g95 -s -O3 direct2.f -o direct2-f.exe
direct2-f.exe
time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
5.109375 s

It takes 2 seconds on that same machine to do a 1000x1000 C++ matrix
multiply using Strassen multiplication.

Evgenii Rudnyi

unread,

May 22, 2008, 2:53:40 PM5/22/08

to

On May 22, 6:08 pm, user923005 <dcor...@connx.com> wrote:
> On May 22, 2:16 am, Evgenii Rudnyi <use...@rudnyi.ru> wrote:

...

> The C++ version is performing vector allocations. It is not the same
> as your other versions which put the data on the stack.

This is true but memory allocation happens only once. To allocate
three arrays should not take too much time. So the difference should
be very small.

> P.S.
> If you make them static arrays, you won't need such an awful stack.
> Since the size is not dynamic, static arrays make sense here (unless
> you want to compare other sizes in which case you should use malloc()
> for C).

You are right. Static arrays would be simpler. But I guess this should
not affect the performance anyway.

> My G++ performance is not like yours. Here are my timings on 2.2 GHz
> AMD running Windows 2003 (32 bit OS):
>
> Your makefile, but CXXFLAG = -s -O3 -DUSECLOCK:
> C:\math\matmul>direct-cc.exe 1000 1000 1000
> time for C(1000,1000) = A(1000,1000) B(1000,1000) is 9.609 s
>
> Microsoft Visual C++ with flags:
> /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
> "USECLOCK" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Zp16 /GS- /
> arch:SSE /Fo"Release\\" /Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /
> Zi /TP /errorReport:prompt
> C:\math\matmul>direct-noprof.exe 1000 1000 1000
> time for C(1000,1000) = A(1000,1000) B(1000,1000) is 7.391 s

Thanks a lot. I will try these flags. I thought that -O2 includes
everything but it seems not to be the case.

Thank you for the suggestion. I guess that if you call DGEMM at your
computer from ATLAS or other optimized BLAS, you should have less than
one second.

My main goal here was just to see how the compiler optimizes the loops.

Evgenii Rudnyi

unread,

May 22, 2008, 3:22:56 PM5/22/08

to

On May 22, 6:08 pm, user923005 <dcor...@connx.com> wrote:

> On May 22, 2:16 am, Evgenii Rudnyi <use...@rudnyi.ru> wrote:

...

> Your makefile, but CXXFLAG = -s -O3 -DUSECLOCK:
> C:\math\matmul>direct-cc.exe 1000 1000 1000
> time for C(1000,1000) = A(1000,1000) B(1000,1000) is 9.609 s
>
> Microsoft Visual C++ with flags:
> /Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
> "USECLOCK" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Zp16 /GS- /
> arch:SSE /Fo"Release\\" /Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /
> Zi /TP /errorReport:prompt
> C:\math\matmul>direct-noprof.exe 1000 1000 1000
> time for C(1000,1000) = A(1000,1000) B(1000,1000) is 7.391 s

Unfortunately your flags did not help at my system. I have the same
difference - about 2 s with g++ more than 6 s with VC++. What versions
of gcc and VC++ do you use? I use gcc 3.3 and VC++ 2005 Express
Edition.

Could it be that Express Edition does not make complete optimization?

user923005

unread,

May 22, 2008, 7:55:35 PM5/22/08

to

dcorbit@DCORBIT64 ~
$ gcc --version
gcc.exe (GCC) 3.2 (mingw special 20020817-1)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

dcorbit@DCORBIT64 ~
$ g++ --version
g++.exe (GCC) 3.2 (mingw special 20020817-1)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

dcorbit@DCORBIT64 ~
$ gfortran --version
GNU Fortran (GCC) 4.3.0
Copyright (C) 2008 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

dcorbit@DCORBIT64 ~
$ g95 --version
G95 (GCC 4.0.3 (g95 0.91!) Feb 27 2008)
Copyright (C) 2002-2005 Free Software Foundation, Inc.

G95 comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of G95
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

dcorbit@DCORBIT64 ~
$

Microsoft Visual Studio 2005
Version 8.0.50727.762 (SP.050727-7600)
Microsoft .NET Framework
Version 2.0.50727 SP1

Installed Edition: Enterprise

Microsoft Visual Basic 2005 77642-113-3000004-41589
Microsoft Visual Basic 2005

Microsoft Visual C# 2005 77642-113-3000004-41589
Microsoft Visual C# 2005

Microsoft Visual C++ 2005 77642-113-3000004-41589
Microsoft Visual C++ 2005

Microsoft Visual J# 2005 77642-113-3000004-41589
Microsoft Visual J# 2005

Microsoft Visual Studio 2005 Tools for Applications
77642-113-3000004-41589
Microsoft Visual Studio 2005 Tools for Applications

Microsoft Visual Studio Tools for Office 77642-113-3000004-41589
Microsoft Visual Studio Tools for the Microsoft Office System

Microsoft Visual Web Developer 2005 77642-113-3000004-41589
Microsoft Visual Web Developer 2005

Microsoft Web Application Projects 2005 77642-113-3000004-41589
Microsoft Web Application Projects 2005
Version 8.0.50727.762

Visual Studio 2005 Team Edition for Developers
77642-113-3000004-41589
Microsoft Visual Studio 2005 Team Edition for Software Developers

Crystal Reports AAC60-G0CSA4B-V7000AY
Crystal Reports for Visual Studio 2005

DevPartner Studio 8.0.0.2999
Compuware DevPartner Studio
Copyright © 2005 Compuware Corporation. All rights reserved.
www.compuware.com

IBM Database Add-Ins 9.1.1.73
IBM Database Add-Ins for Visual Studio 2005. Copyright(c) IBM
Corporation. All rights reserved

Microsoft Visual Studio 2005 Professional Edition - ENU Service Pack 1
(KB926601)
This service pack is for Microsoft Visual Studio 2005 Professional
Edition - ENU.
If you later install a more recent service pack, this service pack
will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/926601

Microsoft Visual Web Developer 2005 Express Edition - ENU Service Pack
1 (KB926751)
This service pack is for Microsoft Visual Web Developer 2005 Express
Edition - ENU.
If you later install a more recent service pack, this service pack
will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/926751

Neumont ORM Architect 1.0.605.525 2006-05CTP
NORMA - Neumont Object-Role Modeling Architect

Security Update for Microsoft Visual Studio 2005 Professional Edition
- ENU (KB937061)
This Security Update is for Microsoft Visual Studio 2005 Professional
Edition - ENU.
If you later install a more recent service pack, this Security Update
will be uninstalled automatically.
For more information, visit http://support.microsoft.com/kb/937061

WorkflowServer Designer 4.5.2.0
WorkflowServer Designer

> Could it be that Express Edition does not make complete optimization?

Yes, for sure that is a problem. From http://www.thefreecountry.com/compilers/cpp.shtml
we have this:
"Microsoft .NET Framework Software Development Kit (SDK) / Free
Microsoft Visual C++ Compiler
The Microsoft Visual C/C++ command line compiler, along with C#,
VB.NET and JScript.NET, is available from Microsoft for download for
free. You will also need to download the Microsoft Windows Platform
SDK which contains the Windows headers and libraries for the
compilers. The command line compiler (at the time this was written/
reviewed) does not have an optimizer (or at least, not the optimizer
that ships with the Professional version)."

The Intel compiler makes very fast matrix code. I do not have the
latest version.

Evgenii Rudnyi

unread,

May 23, 2008, 2:25:29 PM5/23/08

to

On 23 Mai, 01:55, user923005 <dcor...@connx.com> wrote:
...

> > Could it be that Express Edition does not make complete optimization?
>

> Yes, for sure that is a problem. Fromhttp://www.thefreecountry.com/compilers/cpp.shtml

> we have this:
> "Microsoft .NET Framework Software Development Kit (SDK) / Free
> Microsoft Visual C++ Compiler
> The Microsoft Visual C/C++ command line compiler, along with C#,
> VB.NET and JScript.NET, is available from Microsoft for download for
> free. You will also need to download the Microsoft Windows Platform
> SDK which contains the Windows headers and libraries for the
> compilers. The command line compiler (at the time this was written/
> reviewed) does not have an optimizer (or at least, not the optimizer
> that ships with the Professional version)."

Thanks a lot. This information is very useful. One could expect that
Express Edition is not fully functional. On the other side, you may
want to update g++ if you use it often. It seems that they have
improved optimization in 3.3. It would be interesting to see what is
going on in gcc 4.

> The Intel compiler makes very fast matrix code. I do not have the
> latest version.

I have also not the latest - 8. However I guess that for very fast
matrix code one needs Intel MKL (Intel optimized BLAS), without it I
do not expect that it will significantly improve the situation.

Evgenii Rudnyi

unread,

May 27, 2008, 2:33:57 PM5/27/08

to

On May 23, 1:55 am, user923005 <dcor...@connx.com> wrote:
...
> Yes, for sure that is a problem. Fromhttp://www.thefreecountry.com/compilers/cpp.shtml

> we have this:
> "Microsoft .NET Framework Software Development Kit (SDK) / Free
> Microsoft Visual C++ Compiler
> The Microsoft Visual C/C++ command line compiler, along with C#,
> VB.NET and JScript.NET, is available from Microsoft for download for
> free. You will also need to download the Microsoft Windows Platform
> SDK which contains the Windows headers and libraries for the
> compilers. The command line compiler (at the time this was written/
> reviewed) does not have an optimizer (or at least, not the optimizer
> that ships with the Professional version)."

It happens that this is not the case. I have asked at
microsoft.public.vc.language

http://groups.google.com/group/microsoft.public.vc.language/browse_frm/thread/874563e08c779048

and it happens that VC++ by default uses safe iterators. _SECURE_SCL=0
solves the problem:

cl -EHsc -O2 -D_SECURE_SCL=0 -DUSECLOCK direct.cc

Well.

Bill Shortall

unread,

Jun 15, 2008, 7:17:47 PM6/15/08

to

I have writen matrix-matrix multiply functions
for ppLinear that will give a run time of 2.2 secs
for N=1000 and double precision. I am using the
Microsoft VC6 compiler and sometimes the Visual
C++ toolkit (from xona.com ) . I therefor don't
think this is a C versus C++ issue.
When I tried to upgrade from VC6 to Visual C++
.Net, there was a huge slowdown in execution speed.
After struggling a while I found that the "Standard"
version did not include the -O2 optimization. For that
you need to purchase the $500 "professional" verson.
Maybe Microsoft did it again?
Are you sure the O2 optimization is really there?
Make sure you are operating in Release mode not
Debug.
Does the execuable from your C++ version
seem much much bigger than required? You might be
acquiring some NET overhead.
good luck....Bill

"Evgenii Rudnyi" <use...@rudnyi.ru> wrote in message
news:cb76fea3-e7a3-42f7...@y21g2000hsf.googlegroups.com...

Evgenii Rudnyi

unread,

Jun 16, 2008, 2:06:56 PM6/16/08

to

On Jun 16, 1:17 am, "Bill Shortall" <wshort...@centurytel.net> wrote:
> I have writen matrix-matrix multiply functions
> for ppLinear that will give a run time of 2.2 secs
> for N=1000 and double precision. I am using the
> Microsoft VC6 compiler and sometimes the Visual
> C++ toolkit (from xona.com ) . I therefor don't
> think this is a C versus C++ issue.
> When I tried to upgrade from VC6 to Visual C++
> .Net, there was a huge slowdown in execution speed.
> After struggling a while I found that the "Standard"
> version did not include the -O2 optimization. For that
> you need to purchase the $500 "professional" verson.
> Maybe Microsoft did it again?
> Are you sure the O2 optimization is really there?
> Make sure you are operating in Release mode not
> Debug.
> Does the execuable from your C++ version
> seem much much bigger than required? You might be
> acquiring some NET overhead.
> good luck....Bill

Bill,

Thanks for your comments. I do not use NET and I have actually found
the reason at microsoft.public.vc.language

http://groups.google.com/group/microsoft.public.vc.language/browse_frm/thread/874563e08c779048

It happens that VC++ by default uses safe iterators. _SECURE_SCL=0
solves the problem:

cl -EHsc -O2 -D_SECURE_SCL=0 -DUSECLOCK direct.cc

So, the answer was that by default VC++ is secure but slow.

Evgenii

http://MatrixProgramming.com

Ye Gu

unread,

Jun 23, 2008, 8:24:49 PM6/23/08

to

On Jun 17, 2:06 am, Evgenii Rudnyi <use...@rudnyi.ru> wrote:
> On Jun 16, 1:17 am, "Bill Shortall" <wshort...@centurytel.net> wrote:
>
>
>
> > I have writen matrix-matrix multiply functions
> > for ppLinear that will give a run time of 2.2 secs
> > for N=1000 and double precision. I am using the
> > Microsoft VC6 compiler and sometimes the Visual
> > C++ toolkit (from xona.com ) . I therefor don't
> > think this is a C versus C++ issue.
> > When I tried to upgrade from VC6 to Visual C++
> > .Net, there was a huge slowdown in execution speed.
> > After struggling a while I found that the "Standard"
> > version did not include the -O2 optimization. For that
> > you need to purchase the $500 "professional" verson.
> > Maybe Microsoft did it again?
> > Are you sure the O2 optimization is really there?
> > Make sure you are operating in Release mode not
> > Debug.
> > Does the execuable from your C++ version
> > seem much much bigger than required? You might be
> > acquiring some NET overhead.
> > good luck....Bill
>
> Bill,
>
> Thanks for your comments. I do not use NET and I have actually found
> the reason at microsoft.public.vc.language
>

> http://groups.google.com/group/microsoft.public.vc.language/browse_fr...

>
> It happens that VC++ by default uses safe iterators. _SECURE_SCL=0
> solves the problem:
>
> cl -EHsc -O2 -D_SECURE_SCL=0 -DUSECLOCK direct.cc
>
> So, the answer was that by default VC++ is secure but slow.
>
> Evgenii
>
> http://MatrixProgramming.com

Evgenii,
Thank you for the tips to turn off _SECURE_SCL. However, after I turn
off _SECURE_SCL via my project properties, a vector iterator
routine( v.back() ) in OpenMesh that I use in my project reports
access violation.
Am I missing something?

Thx!
coo

user923005

unread,

Jun 23, 2008, 10:05:07 PM6/23/08

to

Probably, there is a bug in your code. You went outside the bounds of
your container.

Ye Gu

unread,

Jun 25, 2008, 3:09:42 AM6/25/08

to

if it went outside the bound, it should have been reported when
turning on _SECURE_SCL, but it does not...
curious...

Evgenii Rudnyi

unread,

Jun 25, 2008, 1:46:28 PM6/25/08

to

On Jun 25, 9:09 am, Ye Gu <cooy...@gmail.com> wrote:
...
> > > > It happens that VC++ by default uses safe iterators. _SECURE_SCL=0
> > > > solves the problem:
>
> > > > cl -EHsc -O2 -D_SECURE_SCL=0 -DUSECLOCK direct.cc
>
> > > > So, the answer was that by default VC++ is secure but slow.
>
> > > > Evgenii
>
> > > >http://MatrixProgramming.com
>
> > > Evgenii,
> > > Thank you for the tips to turn off _SECURE_SCL. However, after I turn
> > > off _SECURE_SCL via my project properties, a vector iterator
> > > routine( v.back() ) in OpenMesh that I use in my project reports
> > > access violation.
> > > Am I missing something?
>
> > Probably, there is a bug in your code. You went outside the bounds of
> > your container.
>
> if it went outside the bound, it should have been reported when
> turning on _SECURE_SCL, but it does not...
> curious...

With iterators there could be subtle bugs. Say by chance you may use
iterators belonging to another container. Or something like this.
Presumably _SECURE_SCL should recognize it as well but the theory and
practice do not coincide all the time. It could be also there are some
other defines that will force more checks.