I have seen this too on our 16 core machines, and could never figure out
a reason or way around it.
For some time I suspected the power management of the linux kernel. It
looked like the M2 process would be switching frequently between
different cores which would trigger cpu frequency scaling of the
individual cores.
However, in recent versions of the linux kernel M2 stays on physically
the same core, but is still slower than on my laptop, while a simple
C-program with a stupid loop will run multiple times faster than on the
laptop.
I asked about this on the mailing list. Look in the archive for a thread
"M2 Performance on Opteron Processor".
regards
Thomas
--
Thomas Kahle
The fundamental theorem of algebra is open source. Like any other
mathematical theorem it can be applied free of charge and everybody
has access to its proof and can convince himself how it works. Why
should software be any different?
Anyway, try running M2 like this:
GC_NPROCS=3 GC_MARKERS=3 M2
Here is the explanation from the appropriate readme file for the
garbage collector we use:
GC_NPROCS=<n> - Linux w/threads only. Explicitly sets the number of
processors
that the GC should expect to use. Note that setting this to
1
when multiple processors are available will preserve
correctness, but may lead to really horrible performance,
since the lock implementation will immediately yield without
first spinning.
GC_MARKERS=<n> - Only if compiled with PARALLEL_MARK. Set the number
of marker threads. This is normally set to the number of
processors. It is safer to adjust GC_MARKERS than GC_NPROCS,
since GC_MARKERS has no impact on the lock implementation.
We do compile with PARALLEL_MARK.