Reston, VA
443-567-8328 (C)
410-278-2692 (O)
256 water should scale very well beyond 2 cpus.
Have you used the same optimizatios with 1 and 2 cpus? (or it is even
the same executable)?
what happens if you do top during a run?
* does the 1-cpu executable really use only 1 cpu (or lapack/fft use
multithreading, and use two cpus)
* does the 2 cpu really go on two processors? do you have some
problems with cpu affinity?
* the timing is cpu time? i.e 2 cpu= twice the time, but half the
real time?
ciao
Fawzi
The em64t is equipped with 8MB of ram. Don't know if this can be abottleneck with 256 waters.
nice numbers... just for the record
> first off, contrary to fawzi's statement, cpu affinity in OpenMPI has
> to be explicitely enabled (e.g. via setting mpi_paffinity_alone=1 in
> ~/.openmpi/mca-params.conf).
well I didn't say it was automatic :), what I meant was that I knew
that both LAM MPI (on which I tested) and openMPI could do it, if
configured to do so, and then I told how to have it with LAM MPI...
> however what both LAM/MPI and
> particularly
> OpenMPI have activated by default are algorithms that can take
> advantage
> of locality and that require the correct specification of nodes.
yep, I think that Open MPI is the best choice to have a well
performing MPI, especially with multiple cores.
There is active development and they try to take advantage of the
latest hardware.
ciao
Fawzi
as we said already on a similar topic in this mailing list, there are
few things that can speed-up
the SCF during an MD.
I will repeat them here hoping that these suggestion will help also
other people.
The most fundamental one is a good extrapolator and a good
preconditioner for the SCF.
instead of those used in that tests you run you may want to try:
&QS
EXTRAPOLATION ASPC
EXTRAPOLATION_ORDER 3
&END
and in &OT
PRECONDITIONER FULL_SINGLE_INVERSE
Of course the choice of the preconditioner is very dependent from the
system you're running. But as a preliminar
trial this should be ok.
Moreover every time OT is used to optimize the wavefunction it is
HIGLY suggested to use a nested SCF procedure..
This does not help at all normal diagonalization schemes but improves
terribly the convergence with OT.
This is the way to go:
&SCF
MAX_SCF 30
&END
&OUTER_SCF
MAX_SCF 5
&END
Don't use too many steps for the inner SCF levels (in the range 30-40
should be ok) and quite few for the outer (5-10)..
Try it and you will see a great improvement for the first SCF
optimization..
Keep in mind that absolute running time depends on many things..
CUTOFF, basis set, preconditioner, extrapolation,
threshold for the convergence of the SCF, precision required in the
integration and collocation of the density and so on..
So a full comparison can be done only with the exactly the same input
file.
Cheers,
Teo