flags for reducing memory consumption for quickstep and quickstep QM/MM

902 views
Skip to first unread message

Axel

unread,
Sep 10, 2007, 1:32:37 PM9/10/07
to cp2k
hi all,

are there any recommendations on how to reduce the memory consumption
of
cp2k calculations, particularly using QS/GPW and QM/MM with QS/GPW
while maintaining a given level of accuracy?

i'm currently having a couple of inputs that would _almost_ fit into a
machine with
multi-core processors using all cores. right now i have to run using
half the cores
and thus waste a lot of cpu time allocation...

cheers,
axel.

Fawzi Mohamed

unread,
Sep 10, 2007, 4:47:52 PM9/10/07
to cp...@googlegroups.com
OT, PRECONDITIONER FULL_KINETIC, do not use DIIS (the default, CG).

changing a little the result you can try smoothing
&XC_GRID
xc_smooth_rho nn10
xc_deriv SPLINE2
&END
with slightly lower cutoff (the total energies will change

I plan to improve the input memory usage Real Soon Now, which should
help.

Fawzi

Teodoro Laino

unread,
Sep 10, 2007, 5:03:11 PM9/10/07
to cp...@googlegroups.com
Axel,

you may just want to try the keyword:

http://cp2k.berlios.de/input/
InputReference~__ROOT__~GLOBAL.html#SAVE_MEM

This usually help in reducing the amount of memory during an MD..

Teo

tkuehne

unread,
Sep 11, 2007, 5:27:15 AM9/11/07
to cp2k
Hi Axel

Have you already tried RS_GRID DISTRIBUTED?
As far as I remember it once reproduces exactly the same numbers, at
least using GPW.

Best regards,
Thomas

toot

unread,
Sep 11, 2007, 5:48:43 AM9/11/07
to cp2k
Toot toot everybody,

i tried RS_GRID DISTRIBUTED for all the grids i've got and doesn't
make a blind bit of difference (to either memory or energies)!

cheers

Rachel

Teodoro Laino

unread,
Sep 11, 2007, 6:21:20 AM9/11/07
to cp...@googlegroups.com
Just curiosity..
How did you check the memory in parallel runs? on which architecture?

teo

toot

unread,
Sep 11, 2007, 8:02:58 AM9/11/07
to cp2k
Architecture: x86-64

Just compared virtual memory size, resident set size and used swap
("top") in each of the runs

rachel

Axel

unread,
Sep 11, 2007, 12:24:53 PM9/11/07
to cp2k
hi rachel and everybody else who answered.

thanks a lot. it is great to see, that we seem
to finally get some sort of 'community' started here.
please keep it up. it is much appreciated.

sadly, most of the tricks mentioned, i was already using.
:-(

a couple more remarks:

On Sep 11, 8:02 am, toot <rachel.gla...@rub.de> wrote:
> Architecture: x86-64

rachel, please provide the full description of the platform,
i.e. hardware, compiler(!), parallel interconnect (hard and software)
and libraries.

i (and teo and many others here) already learned the hard way, that
just the type of cpu is not enough to describe a platform and that
with cp2k due to its use of many 'newer' (new as in introduced less
than 30 years ago... :) ) fortran features, you are always running
the risk of being fooled by a bug in the compiler posing as a bug
in the code. it almost seems as if writing a fully correct _and_
well performing fortran 90/95 compiler is an impossible task, and
that compiler vendors test mostly against legacy (fortran 77 and
older) codes.

> Just compared virtual memory size, resident set size and used swap
> ("top") in each of the runs

i can confirm this on x86_64 using intel 10, OpenMPI, and MKL.
i tested with FIST. i noticed, however, that there are two entries
for ewald one in /FORCE_EVAL/DFT/POISSON/EWALD and one in
/FORCE_EVAL/MM/POISSON/EWALD and both claim to be applicable only
to classical atoms. it would be nice if somebody could clarify this.

out of the three EWALD options, SPME (which i have been using already)
seems to be the least memory hungry followed by plain EWALD and PME.

what strikes me odd, is that in the communication summary, there are
the exact same number of calls to the MP_xxx subroutines in both
cases.
i would have expected that in the distributed case, there is a
(slightly?)
different communication pattern as with replicated. could it be, that
the flag is not correctly handed down? it appears in the restart
files,
so i assume it is parsed ok.


cheers,
axel.

Teodoro Laino

unread,
Sep 11, 2007, 12:40:02 PM9/11/07
to cp...@googlegroups.com
Very good Axel,

let me point the situation:

there's a keyword RS_GRID in the EWALD section (both in MM and DFT) but that one affects only the EWALD calculations.
In particular EWALD in MM should be quite clear.. EWALD in DFT is used for DFTB.

the other place where there's a RS_GRID keyword is in the &QS section and this should affect the memory in your case.

if you use RS_GRID in the EWALD, it is properly parsed but has no effect on your GPW calculation.

Teo

Axel

unread,
Sep 11, 2007, 1:08:23 PM9/11/07
to cp2k
hi teo,

On Sep 11, 12:40 pm, Teodoro Laino <teodoro.la...@gmail.com> wrote:
> Very good Axel,
>
> let me point the situation:
>
> there's a keyword RS_GRID in the EWALD section (both in MM and DFT)
> but that one affects only the EWALD calculations.
> In particular EWALD in MM should be quite clear.. EWALD in DFT is
> used for DFTB.
>
> the other place where there's a RS_GRID keyword is in the &QS section
> and this should affect the memory in your case.

ok, so we should get into the habit of always using the full
"keyword-path" so that there are no misunderstandings.

> if you use RS_GRID in the EWALD, it is properly parsed but has no
> effect on your GPW calculation.

i was testing FIST, i.e. classical MD against FORCE_EVAL/MM/POISSON/
EWALD/RS_GRID
with all three variations of ewald.

i now also tested FORCE_EVAL/DFT/QS/RS_GRID and found
that using DISTRIBUTED actually increases(!) memory usage
(and since linux does lazy memory allocation, RSS shows
actual used/touched memory pages and not only the reserved
address space).

i have been trying the H2O-32.inp from tests/QS/benchmarks
and explicitely added the RS_GRID flag with both values
and then ran across 6 cpus each. in this case there were
actually (very small) differences in total energies (as to
be expected) and also different numbers of calls to different
MP_xxx subroutines.

cheers,
axel.

Teodoro Laino

unread,
Sep 11, 2007, 1:11:48 PM9/11/07
to cp...@googlegroups.com
>
> ok, so we should get into the habit of always using the full
> "keyword-path" so that there are no misunderstandings.
>
yep!

>
> i now also tested FORCE_EVAL/DFT/QS/RS_GRID and found
> that using DISTRIBUTED actually increases(!) memory usage
> (and since linux does lazy memory allocation, RSS shows
> actual used/touched memory pages and not only the reserved
> address space).

Yep, hope people knowing that part better than I do will look into
that..
It's a little bit strange that you observe an increase memory usage..


Teo

toot

unread,
Sep 12, 2007, 6:54:23 AM9/12/07
to cp2k
sorry axel-i'm new to this "forum" game;-D

here goes:

Opteron 275 cpus, 4G RAM, Tyan board (52892);
Melanox MHES18-XSC Infiniband card, Flextronics infiniband switch,
OFED-1.1 software;
intel fortran 9.0; mpirun 1.1.2
my job ran on 12 nodes, but each with only 1 cpu, cos otherwise it
swapped like mad and crashed after a while,
which was rather annoying, for me anyway, the computer probably didn't
care.

I tried out all 3 grids, none of them seemed to make a difference (as
far as i can tell anyway...)

Reply all
Reply to author
Forward
0 new messages