Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

incredible slowdown switching to 64 bit g++

1 view
Skip to first unread message

nandor...@gmail.com

unread,
Nov 25, 2008, 1:21:23 AM11/25/08
to
I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?

Some suspects:

-The lapack library
- Tolerances I use for floating point comparisons
- Large vector<vector<int > > variables ( even vector<vector<vector< >
> > variable )
- Need a compiler option on the 64 bit machines?
- Random number generator

Kai-Uwe Bux

unread,
Nov 25, 2008, 1:46:00 AM11/25/08
to
nandor...@gmail.com wrote:

> I have a fairly complex C++ program that uses a lot of STL, number
> crunching using doubles and the lapack library (-llapack -lblas -lg2c -
> lm). The code works fine on any 32 bit unix machine compiled with g++
> but when I try it on a 64 bit machine a running time of 10 seconds
> becomes 15 minutes. The code is complex, I could not create a simple
> subset that produces this problem. I tried this on several 32 and 64
> bit machines. The speed of the machines are comparable. I use -O2
> optimization. The program is not swapping to disk. What could cause
> this incredible slowdown?

[snip]

Just A Quick question: does the program do the same thing on a 64bit machine
as on a 32bit machine? has testing shown that for the same input you get
the same output?


Best

Kai-Uwe Bux

Ian Collins

unread,
Nov 25, 2008, 2:03:17 AM11/25/08
to
nandor...@gmail.com wrote:
> I have a fairly complex C++ program that uses a lot of STL, number
> crunching using doubles and the lapack library (-llapack -lblas -lg2c -
> lm). The code works fine on any 32 bit unix machine compiled with g++
> but when I try it on a 64 bit machine a running time of 10 seconds
> becomes 15 minutes. The code is complex, I could not create a simple
> subset that produces this problem. I tried this on several 32 and 64
> bit machines. The speed of the machines are comparable. I use -O2
> optimization. The program is not swapping to disk. What could cause
> this incredible slowdown?
>
You'll have to profile to find out. It's not uncommon for 64 bit
executables to be slower when the code has been tuned for 32bit. That's
one reason why 32 bit executables are still common on 64 bit platforms.

> Some suspects:
>
> -The lapack library
> - Tolerances I use for floating point comparisons

Shouldn't matter.

> - Large vector<vector<int > > variables ( even vector<vector<vector< >
>>> variable )

Shouldn't matter. A heavy use of long might.

Try a gcc or Linux group or maybe comp.unix.programmer.

--
Ian Collins

nandor...@gmail.com

unread,
Nov 25, 2008, 2:27:19 AM11/25/08
to
> Just A Quick question: does the program do the same thing on a 64bit machine
> as on a 32bit machine? has testing shown that for the same input you get
> the same output?

There are small differences in the values of doubles but I guess
that's not unexpected.


nandor...@gmail.com

unread,
Nov 25, 2008, 2:31:07 AM11/25/08
to
> You'll have to profile to find out.  

I did try profiling but I did not make much sense of it. Generally it
seemed
like everything takes somewhat longer.

> It's not uncommon for 64 bit
> executables to be slower when the code has been tuned for 32bit.  That's
> one reason why 32 bit executables are still common on 64 bit platforms.

But could it be such a huge difference? What does it mean to be tuned
for 32bit?
The code does not depend on it, could it be that the STL library is
optimized for
32 bit or the lapack library?

> Shouldn't matter.  A heavy use of long might.

No long in the code.

nandor...@gmail.com

unread,
Nov 25, 2008, 2:36:04 AM11/25/08
to
> It's not uncommon for 64 bit
> executables to be slower when the code has been tuned for 32bit.  That's
> one reason why 32 bit executables are still common on 64 bit platforms.

I am not trying to run the executable compiled on the 32 bit machine.
I recompile
everything on the 64 bit machines.

Ian Collins

unread,
Nov 25, 2008, 2:45:23 AM11/25/08
to

Why?

--
Ian Collins

nandor...@gmail.com

unread,
Nov 25, 2008, 2:57:21 AM11/25/08
to
> > I am not trying to run the executable compiled on the 32 bit machine.
> > I recompile
> > everything on the 64 bit machines.
>
> Why?

It is my own code. Since I have the source code it makes sense to
recompile and hope it
will be optimized for the new machine. I don't know if the 32 bit
executable would run on
the 64 bit machines but perhaps I should try that.

Could this piece of code be responsible?

extern "C"
{
void dsyev_ (const char *jobz,
const char *uplo,
const int &n,
double a[],
const int &lda,
double w[], double work[], int &lwork, int &info);
}

int
dsyev (const vector < vector < double > >&mat, vector < double
>&eval,
vector < vector < double > >&evec)
{
...
dsyev_ ("V", "U", n, a, n, w, work, lwork, info);
...
}

This is how I use the fortran lapack library. Perhaps the type sizes
change differently in C++ and in Fortran
when going from 32 bit to 64 bit.

Ian Collins

unread,
Nov 25, 2008, 3:04:05 AM11/25/08
to
nandor...@gmail.com wrote:
>>> I am not trying to run the executable compiled on the 32 bit machine.
>>> I recompile
>>> everything on the 64 bit machines.
>> Why?
>
> It is my own code. Since I have the source code it makes sense to
> recompile and hope it
> will be optimized for the new machine. I don't know if the 32 bit
> executable would run on
> the 64 bit machines but perhaps I should try that.
>
Under any decent OS, they should. I don't use 32 bit systems any more
and I seldom build 64 bit executables.

> Could this piece of code be responsible?
>
> extern "C"
> {
> void dsyev_ (const char *jobz,
> const char *uplo,
> const int &n,
> double a[],
> const int &lda,
> double w[], double work[], int &lwork, int &info);
> }
>

doubles or ints shouldn't be an issue.

You'd should try asking on a more specialised group. You should be able
to find a 64 porting guide for your platform.

--
Ian Collins

Maxim Yegorushkin

unread,
Nov 25, 2008, 6:51:34 AM11/25/08
to
On Nov 25, 6:21 am, nandor.sie...@gmail.com wrote:
> I have a fairly complex C++ program that uses a lot of STL, number
> crunching using doubles and the lapack library (-llapack -lblas -lg2c -
> lm). The code works fine on any 32 bit unix machine compiled with g++
> but when I try it on a 64 bit machine a running time of 10 seconds
> becomes 15 minutes. The code is complex, I could not create a simple
> subset that produces this problem. I tried this on several 32 and 64
> bit machines. The speed of the machines are comparable. I use -O2
> optimization. The program is not swapping to disk. What could cause
> this incredible slowdown?

[]

Have you tried comparing 32 and 64-bit versions compiled on the very
same machine? Use -m32 compiler switch to compile a 32-bit version.

--
Max

James Kanze

unread,
Nov 25, 2008, 7:03:14 AM11/25/08
to

Not really. Both the 64 bit machine and the 32 bit one are
probably using IEEE doubles. Even on a 32 bit machine, a double
is normally 64 bits.

Compiling in 64 bit mode will often result some reduction in
speed, because of larger program size, and thus poorer locality.
I can't imagine this representing more than a difference of
about 10 or 20 percent, however, and I would expect it usually
to be a lot less.

Have you profiled the two cases, to see which functions have
become significantly slower?

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

joseph cook

unread,
Nov 25, 2008, 8:13:10 AM11/25/08
to
On Nov 25, 1:21 am, nandor.sie...@gmail.com wrote:
The program is not swapping to disk. What could cause
> this incredible slowdown?
>
> Some suspects:
>
> -The lapack library
> - Tolerances I use for floating point comparisons
> - Large vector<vector<int > > variables ( even vector<vector<vector< >> >  variable )
>
> - Need a compiler option on the 64 bit machines?
> - Random number generator

Your #1 suspect you have not mentioned. You switched out the
machine! The amount of memory and the architecture (how much L1
cache for example) will make major differences. I'm sure you got
this new machine because it has better specs, but maybe it is short on
memory for the loading. (Is this program competing with a Windows O/S
while it runs now, or something similar?). Maybe there is a
architecture specific flag you should be adding on these new
machines ?

Joe

gpderetta

unread,
Nov 26, 2008, 9:44:15 AM11/26/08
to

Random guess: maybe the lapack/blas library uses hand optimized
assembler in 32 bit mode while it uses unoptimized C code (or fortran
or whatever) in 64 mode. But I do not think this is enough to explain
the slow down.

--
Giovanni P. Deretta

Lionel B

unread,
Nov 26, 2008, 10:33:12 AM11/26/08
to

That was my thought; many default OS installations will supply an
unoptimised "reference" BLAS/LAPACK.

> But I do not think this is enough to explain the slow down.

I have seen pretty drastic performance hits with reference BLAS/LAPACK as
compared with a vendor-supplied optimised version, or something like
ATLAS, although not quite that dramatic...

[BTW, is there something odd about the wrapping/flowing in the previous
article? My newsreader seems to display it as blank unless I force it to
wrap.]

--
Lionel B

nandor...@gmail.com

unread,
Nov 27, 2008, 12:09:53 AM11/27/08
to

> Have you tried comparing 32 and 64-bit versions compiled on the very
> same machine? Use -m32 compiler switch to compile a 32-bit version.
> Max

I compiled the code on the 3-bit machine and run it on the 64-bit
machine.
It runs without the extreme slowdown.

I was not able to compile on the 64-bit machine using -m32. It
produces the error message:
/usr/bin/ld: cannot open gcrt1.o: No such file or directory
collect2: ld returned 1 exit status

nandor...@gmail.com

unread,
Nov 27, 2008, 12:12:40 AM11/27/08
to
> Your #1 suspect you have not mentioned.  You switched out the
> machine!   The amount of memory and the architecture (how much L1
> cache for example) will make major differences.   I'm sure you got
> this new machine because it has better specs, but maybe it is short on
> memory for the loading.  (Is this program competing with a Windows O/S
> while it runs now, or something similar?).  Maybe there is a
> architecture specific flag you should be adding on these new
> machines ?
>
> Joe

I tried it on two different 32 bit machine (Redhat, Ubuntu) and three
different 64-bit machine.
The machines don't matter. The compiler is g++ in all 5 cases. The
systems are comparable in speed
and memory.

nandor...@gmail.com

unread,
Nov 27, 2008, 12:13:58 AM11/27/08
to
> Random guess: maybe the lapack/blas library uses hand optimized
> assembler in 32 bit mode while it uses unoptimized C code (or fortran
> or whatever) in 64 mode. But I do not think this is enough to explain
> the slow down.
> --
> Giovanni P. Deretta

Based on printed output, it looks like everything is slower, not only
the lapack calls.

Maxim Yegorushkin

unread,
Nov 27, 2008, 7:15:41 AM11/27/08
to

It is a linker error, not a compiler one. Try using -m32 switch for
linking as well.

--
Max

nandor...@gmail.com

unread,
Nov 28, 2008, 12:39:52 AM11/28/08
to
> It is a linker error, not a compiler one. Try using -m32 switch for
> linking as well.
>
> --
> Max

I had the -m32 for the linker. I moved to another 64 bit machine. Same
slowness for
regular compiling. Runs as fast as it does for the 32 bit machines
when I compile
with -m32.

Paavo Helde

unread,
Nov 28, 2008, 2:29:33 AM11/28/08
to
nandor...@gmail.com kirjutas:

You have to profile it and find out the reasons. One possible reason for
mysterious slowdons is cache line trashing, this depends on the size of the
data set, which might be different on a 64-bit platform. Such problems
might be curable by using a better algorithm.

Another cause might of course be a programming error, like storing a result
of std::string::find() in a 32-bit unsigned int, and then comparing against
64-bit std::string::npos. But in this case the program would likely crash
or produce different results.

hth
Paavo

nandor...@gmail.com

unread,
Nov 28, 2008, 4:19:24 PM11/28/08
to
Thank you everybody for the help. I have found the solution.
It is simple, I just need the compiler flag -ffast-math.
0 new messages