sec build
--- -----
0.9 Win32, g++, optimized, linked with paq7asm
1.5 Win32, g++, optimized, pure C++ (-DNOASM)
3.6 Win32, g++, no optimizations
13.9 Linux, g++, optimized, linked with paq7asm-x86_64 (output size is
3225)
59 Linux, g++, optimized, pure C++ (-DNOASM)
308 Linux, g++, no optimizations
This is a dual boot system with a 2.2 GHz Athlon-64 and 2 GB memory.
Win32: XP SP2 home, MinGW g++ 3.4.5, compiled with g++ -O2 -Os -s
-march=pentiumpro -fomit-frame-pointer
Linux: Ubuntu 2.6.15.27-amd64-generic, g++ 4.0.3 x86_64, compiled with
-O2 -Os -s -fomit-frame-pointer
I will look into this further, but I wonder if anyone has seen this
behavior in other programs. Long ago I observed extremely slow
execution on a Solaris port of an earlier PAQ version, but I thought
that was just due to not using the assembler code.
One major difference between the 32 and 64 bit versions is the size of
type long and pointers. However, I compared the archives and they are
bitwise identical from all builds except the 64 bit assembler version.
XP Home does not run 64 bit programs even if you have the processor.
The 32-bit NASM assembler code does 16 bit signed vector operations
using the 64 bit MMX registers. The 64 bit YASM code by Matthew Fite
does the same using the 128 bit SSE2 registers. Of course I expected
the 64 bit version to be faster, with or without the assembler code
(which I hope to use in later PAQ versions). So I wonder if anyone has
observed other programs running much slower in 64 bit mode?
-- Matt Mahoney
I know chess software benefit a lot from 64bit (and multi core):
Something like:
32bit --> 64bit = 60% increase.
One core --> dual core = 70% increase.
See also some benchmarks:
http://www.sedatchess.com/hardwares.html
> ... execution speed. It seems to be much slower in 64-bit mode under Linux
> than 32-bit mode under Windows, even for pure C++ code.
<snip>
> This is a dual boot system with a 2.2 GHz Athlon-64 and 2 GB memory.
> Win32: XP SP2 home, MinGW g++ 3.4.5, compiled with g++ -O2 -Os -s
> -march=pentiumpro -fomit-frame-pointer
> Linux: Ubuntu 2.6.15.27-amd64-generic, g++ 4.0.3 x86_64, compiled with
> -O2 -Os -s -fomit-frame-pointer
Why do you force the compiler to produce code for the old 32-bit
Pentium Pro processor if you want a high-performance run a 64-bit
Athlon CPU? Shouldn't you use "-march=athlon64" instead of
"-march=pentiumpro"?
Christian
If Ubuntu is able to run 32 bit code, try to compile with -m32
and see what the speed is.
> One major difference between the 32 and 64 bit versions is the size of
> type long and pointers.
The problem is that since long is 64 bit you double data size
which results in dcache thrashing. Try to replace long with
int.
> Of course I expected
> the 64 bit version to be faster, with or without the assembler code
> (which I hope to use in later PAQ versions). So I wonder if anyone has
> observed other programs running much slower in 64 bit mode?
I have never observed slowing down on compute bounded programs.
At least not when long has been replaced with int :)
Laurent
In fact I had to take out -march=pentiumpro in Linux (machine type not
supported error). I forgot to mention this. In Windows it was the
lowest processor type that didn't significantly affect performance.
Anyway I will post when I find the bug.
-- Matt Mahoney
for (int i=0; i<ncxt; ++i) {
for( int j=0; j< nx; j++ )
#ifdef NOASM // no assembly language
pr[i]=squash(dot_product(&tx[0], &wx[cxt[i]*N], nx)>>5);
#elif __x86_64
pr[i]=squash(dot_product_x86_64(&tx[0], &wx[cxt[i]*N], nx)>>5);
#else
pr[i]=squash(dot_product(&tx[0], &wx[cxt[i]*N], nx)>>5);
#endif
The internal j loop is spurious.
Matt Mahoney wrote:
> sec build
> --- -----
> 0.9 Win32, g++, optimized, linked with paq7asm
> 1.5 Win32, g++, optimized, pure C++ (-DNOASM)
> 3.6 Win32, g++, no optimizations
> 13.9 Linux, g++, optimized, linked with paq7asm-x86_64 (output size is
> 3225)
> 59 Linux, g++, optimized, pure C++ (-DNOASM)
> 308 Linux, g++, no optimizations
>
> This is a dual boot system with a 2.2 GHz Athlon-64 and 2 GB memory.
> Win32: XP SP2 home, MinGW g++ 3.4.5, compiled with g++ -O2 -Os -s
> -march=pentiumpro -fomit-frame-pointer
> Linux: Ubuntu 2.6.15.27-amd64-generic, g++ 4.0.3 x86_64, compiled with
> -O2 -Os -s -fomit-frame-pointer
On a 2.4 GHz Opteron, gcc 4.1.1 and NOASM I get this:
64 bit build: 1.55 sec
32 bit build: 1.96 sec
This definitely look better :-)
Laurent
Yes, that is exactly the problem (in Mixer::p()). I took out the extra
loop and it worked (with -DNOASM).
Now I just need to find the other bug in the 64 bit assembler code.
Hopefully this should result in 64 bit Linux versions of all PAQ
versions starting with PAQ7.
I have fixed the bugs in the 64 bit SSE2 assembler code (in train()).
The new code can be linked to any paq7 or paq8 version with no source
code changes to produce a 64 bit Linux executable. I have not tested
it with 64 bit Windows or 32 bit Linux but there is no reason the code
should not work as written. On my Athlon-64, the Linux-64 version is
about 7% faster than the Win32 version. I have produced Linux
executables for paq8f and paq8jd. The code is here:
http://cs.fit.edu/~mmahoney/compression/#paq8
direct links:
http://cs.fit.edu/~mmahoney/compression/paq8f.zip
http://cs.fit.edu/~mmahoney/compression/paq8jd.zip (newer, better
compression, but slower)
Archive contents:
paq8f.cpp or paq8jd.cpp - source code (unchanged)
paq7asm.asm - 32 bit NASM/YASM assembler code (unchanged)
paq7asm-x86_64.asm - 64 bit YASM assembler code ver. 2 (fixed)
paq7asm-x86_64.o - above, assembled with YASM for 64 bit Linux (new)
paq8f.exe or paq8jd.exe - Win32 executable linked with paq7asm
(unchanged)
paq8f or paq8jd - Linux x86_64 executable (new)
See paq7asm-x86_64.asm comments for 64 bit compilation instructions.
-- Matt Mahoney
Note: the original 64 bit code at
http://ilovemyking.googlepages.com/paqpage uses the old assembler code,
which does not work. Use this code instead.
Couldn't you try 32 bit Linux with -m32? I really would like
to see how it compares.
Laurent
I haven't tried it, but I think it should run the same speed as the 32
bit Windows version or a bit slower. Some compilers like Intel and
VC++ produce code a bit faster than g++.
-- Matt Mahoney