Some suspects:
-The lapack library
- Tolerances I use for floating point comparisons
- Large vector<vector<int > > variables ( even vector<vector<vector< >
> > variable )
- Need a compiler option on the 64 bit machines?
- Random number generator
> I have a fairly complex C++ program that uses a lot of STL, number
> crunching using doubles and the lapack library (-llapack -lblas -lg2c -
> lm). The code works fine on any 32 bit unix machine compiled with g++
> but when I try it on a 64 bit machine a running time of 10 seconds
> becomes 15 minutes. The code is complex, I could not create a simple
> subset that produces this problem. I tried this on several 32 and 64
> bit machines. The speed of the machines are comparable. I use -O2
> optimization. The program is not swapping to disk. What could cause
> this incredible slowdown?
[snip]
Just A Quick question: does the program do the same thing on a 64bit machine
as on a 32bit machine? has testing shown that for the same input you get
the same output?
Best
Kai-Uwe Bux
> Some suspects:
>
> -The lapack library
> - Tolerances I use for floating point comparisons
Shouldn't matter.
> - Large vector<vector<int > > variables ( even vector<vector<vector< >
>>> variable )
Shouldn't matter. A heavy use of long might.
Try a gcc or Linux group or maybe comp.unix.programmer.
--
Ian Collins
There are small differences in the values of doubles but I guess
that's not unexpected.
I did try profiling but I did not make much sense of it. Generally it
seemed
like everything takes somewhat longer.
> It's not uncommon for 64 bit
> executables to be slower when the code has been tuned for 32bit. That's
> one reason why 32 bit executables are still common on 64 bit platforms.
But could it be such a huge difference? What does it mean to be tuned
for 32bit?
The code does not depend on it, could it be that the STL library is
optimized for
32 bit or the lapack library?
> Shouldn't matter. A heavy use of long might.
No long in the code.
I am not trying to run the executable compiled on the 32 bit machine.
I recompile
everything on the 64 bit machines.
Why?
--
Ian Collins
It is my own code. Since I have the source code it makes sense to
recompile and hope it
will be optimized for the new machine. I don't know if the 32 bit
executable would run on
the 64 bit machines but perhaps I should try that.
Could this piece of code be responsible?
extern "C"
{
void dsyev_ (const char *jobz,
const char *uplo,
const int &n,
double a[],
const int &lda,
double w[], double work[], int &lwork, int &info);
}
int
dsyev (const vector < vector < double > >&mat, vector < double
>&eval,
vector < vector < double > >&evec)
{
...
dsyev_ ("V", "U", n, a, n, w, work, lwork, info);
...
}
This is how I use the fortran lapack library. Perhaps the type sizes
change differently in C++ and in Fortran
when going from 32 bit to 64 bit.
> Could this piece of code be responsible?
>
> extern "C"
> {
> void dsyev_ (const char *jobz,
> const char *uplo,
> const int &n,
> double a[],
> const int &lda,
> double w[], double work[], int &lwork, int &info);
> }
>
doubles or ints shouldn't be an issue.
You'd should try asking on a more specialised group. You should be able
to find a 64 porting guide for your platform.
--
Ian Collins
[]
Have you tried comparing 32 and 64-bit versions compiled on the very
same machine? Use -m32 compiler switch to compile a 32-bit version.
--
Max
Not really. Both the 64 bit machine and the 32 bit one are
probably using IEEE doubles. Even on a 32 bit machine, a double
is normally 64 bits.
Compiling in 64 bit mode will often result some reduction in
speed, because of larger program size, and thus poorer locality.
I can't imagine this representing more than a difference of
about 10 or 20 percent, however, and I would expect it usually
to be a lot less.
Have you profiled the two cases, to see which functions have
become significantly slower?
--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Your #1 suspect you have not mentioned. You switched out the
machine! The amount of memory and the architecture (how much L1
cache for example) will make major differences. I'm sure you got
this new machine because it has better specs, but maybe it is short on
memory for the loading. (Is this program competing with a Windows O/S
while it runs now, or something similar?). Maybe there is a
architecture specific flag you should be adding on these new
machines ?
Joe
Random guess: maybe the lapack/blas library uses hand optimized
assembler in 32 bit mode while it uses unoptimized C code (or fortran
or whatever) in 64 mode. But I do not think this is enough to explain
the slow down.
--
Giovanni P. Deretta
That was my thought; many default OS installations will supply an
unoptimised "reference" BLAS/LAPACK.
> But I do not think this is enough to explain the slow down.
I have seen pretty drastic performance hits with reference BLAS/LAPACK as
compared with a vendor-supplied optimised version, or something like
ATLAS, although not quite that dramatic...
[BTW, is there something odd about the wrapping/flowing in the previous
article? My newsreader seems to display it as blank unless I force it to
wrap.]
--
Lionel B
I compiled the code on the 3-bit machine and run it on the 64-bit
machine.
It runs without the extreme slowdown.
I was not able to compile on the 64-bit machine using -m32. It
produces the error message:
/usr/bin/ld: cannot open gcrt1.o: No such file or directory
collect2: ld returned 1 exit status
I tried it on two different 32 bit machine (Redhat, Ubuntu) and three
different 64-bit machine.
The machines don't matter. The compiler is g++ in all 5 cases. The
systems are comparable in speed
and memory.
Based on printed output, it looks like everything is slower, not only
the lapack calls.
It is a linker error, not a compiler one. Try using -m32 switch for
linking as well.
--
Max
I had the -m32 for the linker. I moved to another 64 bit machine. Same
slowness for
regular compiling. Runs as fast as it does for the 32 bit machines
when I compile
with -m32.
You have to profile it and find out the reasons. One possible reason for
mysterious slowdons is cache line trashing, this depends on the size of the
data set, which might be different on a 64-bit platform. Such problems
might be curable by using a better algorithm.
Another cause might of course be a programming error, like storing a result
of std::string::find() in a 32-bit unsigned int, and then comparing against
64-bit std::string::npos. But in this case the program would likely crash
or produce different results.
hth
Paavo