merged wannier center and coordinate output file

Axel

unread,

Jul 6, 2007, 12:13:15 PM7/6/07

to cp2k

hi!

is there a (simple) way in cp2k to get a merged trajectory output
from the calculations of wannier centers with the atom posistions,
similar to the ION+CENTERS.xyz file in CPMD?

right now, the code seems to print out the centers in an .xyz-like
format (with .data appended) into individual files, which could get
a bit messy due to the need for direct postprocessing for the length
of trajectories i'm currently looking at.

cheers,
axel.

Teodoro Laino

unread,

Jul 9, 2007, 3:12:16 AM7/9/07

to cp...@googlegroups.com

Hi Axel,

On 6 Jul 2007, at 18:13, Axel wrote:

hi!

is there a (simple) way in cp2k to get a merged trajectory output
from the calculations of wannier centers with the atom posistions,
similar to the ION+CENTERS.xyz file in CPMD?

It's not possible to have a merged files with trajectory and wannier centers..

but..

right now, the code seems to print out the centers in an .xyz-like
format (with .data appended) into individual files, which could get
a bit messy due to the need for direct postprocessing for the length
of trajectories i'm currently looking at.

you can print all wannier centers in a unique .xyz-like file..

In this way you have two different files (one for the trajectory and the other for the wannier centers) and

the post processing should be much more easy (using awk and paste)..

To have one single file for all wannier centers you have to modify in the section:

__ROOT__%FORCE_EVAL%DFT%QS%LOCALIZE%PRINT%WANNIER_CENTERS

with the keyword:

COMMON_ITERATION_LEVELS 1

cheers,

Teo

cheers,
axel.

Fawzi Mohamed

unread,

Jul 9, 2007, 5:43:34 AM7/9/07

to cp...@googlegroups.com

If you set common_iter_level higher (1 or 2) the various md steps
should end up in the same file, not in separate files.

If you want to have position and wannier in the same file you have to
play with the filename parameter of the print key (so that they use
the same name).

You can also force the system to use the filename you decide by using

FILENAME =myBeautifulFilename.myGreatExtension

in the printkey.
The files are always appended and should be closed each time.
Thus if you use xyz for trajectory you should have what you want (but
I didn't try exactly that combination).

ciao
Fawzi

Teodoro Laino

unread,

Jul 9, 2007, 10:57:41 AM7/9/07

to cp...@googlegroups.com

Just a comment: take care that what you get writing the TRAJECTORY
and the WANNIER CENTERS to the same
output file is NOT the IONS+CENTER.xyz format..
It's better in my opinion if you postprocess the two xyz files (the
trajectory and the wannier centers) to create
your own IONS+CENTER.xyz.

Cheers
teo

Axel

unread,

Jul 9, 2007, 12:03:48 PM7/9/07

to cp2k

thanks teo and fawzi for your explanations.

unfortunately, it raises another question,
that i'll better ask in a separate thread.

it is obvious that i can fix everything up with
postprocessing later, but i'd rather have everything
merged in one file from within the code so that i don't
have to synchronize multiple sets of parameters all the time,
preferably binary as postprocessing text format files can
become very time consuming (e.g. in QM/MM!) and also
for the kind of calculations that i have in mind the
text files would be _huge_ and writing them would also
affect parallel performance.

so if cp2k cannot do it as it is right now, can you give
me any hints, on how i could implement this without having
to read through all of the source code? the way i see it,
i'd need to allocate some memory somewhere and potentially
have a proxy topology for it, copy the coordinates into the
allocated memory, then the wannier centers, apply PBC and
then write using the normal write functionality (so one can
have either .xyz, .pdb, .dcd or whatever else format will
be supported in the future).

apropos parallel performance, would there be a way to tell
cp2k to keep files open during a run? on several machines
that we are running on, frequent open/close of files can
have serious impact on (parallel) performance.

ciao,
axel.

Teodoro Laino

unread,

Jul 9, 2007, 12:33:57 PM7/9/07

to cp...@googlegroups.com

On 9 Jul 2007, at 18:03, Axel wrote:

>
>
> thanks teo and fawzi for your explanations.
>
> unfortunately, it raises another question,
> that i'll better ask in a separate thread.
>
> it is obvious that i can fix everything up with
> postprocessing later, but i'd rather have everything
> merged in one file from within the code so that i don't
> have to synchronize multiple sets of parameters all the time,
> preferably binary as postprocessing text format files can
> become very time consuming (e.g. in QM/MM!) and also
> for the kind of calculations that i have in mind the
> text files would be _huge_ and writing them would also
> affect parallel performance.
>

IONS+CENTER.xyz is a text file as far as I understand from the CPMD..
So writing a text files takes as much time in cp2k as in cpmd..
Now rises the question.. why creating a new format?
The world is already full of a messiness of formats (many of them not
even documented (have a look at the time we both spent together for
the PSF))..

But since we live in a democratic world, if you want a single file
this is what you've to do:

Immediatey after you've your wannier centers you can create a fake
particle_set with the dimension of particles+wannier centers
(you can do this locally.. no need to have one allocated since the
very beginning of the calculation..
the cost of allocating/deallocating this particle_set is negligible
w.r.t. the QS calculation)
and fill the particle_set with all information of the real particles
and the fake informations from wannier centers..
You can limit yourself to fill the information in the particle_set
only printed out from the routines that write the coordinates..
Once the particle_set is filled you can call the routine that dumps
the atomic coordinates (at the moment supports only XYZ, DCD no PDB)..
Remember to apply PBC (if you want them) because normally the
particle_set is never processed regarding PBC...
In this way you get one single file with all informations you need...

> so if cp2k cannot do it as it is right now, can you give
> me any hints, on how i could implement this without having
> to read through all of the source code? the way i see it,
> i'd need to allocate some memory somewhere and potentially
> have a proxy topology for it, copy the coordinates into the
> allocated memory, then the wannier centers, apply PBC and
> then write using the normal write functionality (so one can
> have either .xyz, .pdb, .dcd or whatever else format will
> be supported in the future).

>
> apropos parallel performance, would there be a way to tell
> cp2k to keep files open during a run? on several machines
> that we are running on, frequent open/close of files can
> have serious impact on (parallel) performance.

This cannot *easily* be avoided due to the general idea behind of the
print_keys and the high potentiality they have..
Let me just say that in general I/O (even if not continuously opening/
closing the unit) has great impact on the
parallel performance (since we don't do parallel IO).. So people are
highly invited to write to the disk only
with a reasonable frequency.. obviously there are cases in which you
have to write with an high frequency..
Well in these cases I could never imagine that opening/closing a unit
has an impact on the performance greater
than writing data on files..
Just for my curiosity can you provide some numbers regarding this
behavior?

ciao,
Teo

>
> ciao,
> axel.
>
>
>
> >

Axel

unread,

Jul 9, 2007, 1:31:34 PM7/9/07

to cp2k

On Jul 9, 12:33 pm, Teodoro Laino <teodoro.la...@gmail.com> wrote:
> On 9 Jul 2007, at 18:03, Axel wrote:

[...]

> IONS+CENTER.xyz is a text file as far as I understand from the CPMD..
> So writing a text files takes as much time in cp2k as in cpmd..

in my (hacked) version of cpmd i can write .dcd. ;-)

> Now rises the question.. why creating a new format?

sorry, but this is _not_ a new format. this is about writing an
alternative output with a well supported format (.dcd,.xyz,.pdb).
i'd rather call the current cp2k output of the wannier centers a
new format with the spread being written in the three columns
after the coordinates, as plain .xyz has only the coordinates.
ok, almost all visualization programs ignore everything beyond
the 4th column so it is no problem in practice and offers
desirable additional information instead.

as to why write an _alternate_ output file: convenience and
consistency. with postprocessing, i always run the risk of
mixing the wrong files and i have to re-do something, that seems
much easier to do during the run itself.

> The world is already full of a messiness of formats (many of them not
> even documented (have a look at the time we both spent together for
> the PSF))..

exactly! why should i have to write a program that needs to
postprocess
my data, when cp2k can write it in a well supported format right away?

> But since we live in a democratic world, if you want a single file
> this is what you've to do:
>
> Immediatey after you've your wannier centers you can create a fake
> particle_set with the dimension of particles+wannier centers
> (you can do this locally.. no need to have one allocated since the
> very beginning of the calculation..
> the cost of allocating/deallocating this particle_set is negligible
> w.r.t. the QS calculation)
> and fill the particle_set with all information of the real particles
> and the fake informations from wannier centers..
> You can limit yourself to fill the information in the particle_set
> only printed out from the routines that write the coordinates..
> Once the particle_set is filled you can call the routine that dumps
> the atomic coordinates (at the moment supports only XYZ, DCD no PDB)..
> Remember to apply PBC (if you want them) because normally the
> particle_set is never processed regarding PBC...
> In this way you get one single file with all informations you need...

thanks a lot. i'll look into it.

[...]

> > apropos parallel performance, would there be a way to tell
> > cp2k to keep files open during a run? on several machines
> > that we are running on, frequent open/close of files can
> > have serious impact on (parallel) performance.
>
> This cannot *easily* be avoided due to the general idea behind of the
> print_keys and the high potentiality they have..
> Let me just say that in general I/O (even if not continuously opening/
> closing the unit) has great impact on the
> parallel performance (since we don't do parallel IO).. So people are
> highly invited to write to the disk only
> with a reasonable frequency.. obviously there are cases in which you

i totally agree. this is what i am currently experimenting
with and hence the many questions about i/o.

> have to write with an high frequency..
> Well in these cases I could never imagine that opening/closing a unit
> has an impact on the performance greater
> than writing data on files..

please factor in i/o buffering. with a close you force a flush and
a sync of the file. if you keep the file open, you'll write to a
buffer and only when the buffer is full, it will be written to the
file system.

> Just for my curiosity can you provide some numbers regarding this
> behavior?

no numbers with cp2k. i've seen significant up to dramatic changes
with cpmd and particularly quantum espresso (since wavefunction files
are there temporary, one can even create a pseudo-ramdisk by having
a huge buffer that can hold the whole file and intercept all flushes
and file closes). this currently affects only machines like the
cray xt3 with no local disk at all where using the iobuf module
from cray allows me to manage file buffers on a per file(name)
basis and to intercept flushes and close. however, intercepting
close under those circumstances works only for scratch files as there
is no final flush/close at the end of the job. so the open/close
of cp2k will render all optimizations in that direction meaningless.

this may not be a big issue for most of the current machine and
users, but i expect that more and more machines will have to use
parallel file systems like lustre, GPFS and alike and at the same
time, people want to run larger jobs faster on their new fancy
machines,
so this may become a much more important over the next few years.

e.g. at the moment, i have managed to get the 64 water QS benchmark
example (without localization, it may be intersting to try running
that on a separate set of nodes, btw) down to about 10 seconds per
MD step without tampering with the potentials, basis set, cutoff etc.
and it seems to scale out at around 64 cpus (=32 dual core nodes) on
the xt3 in pittsburgh. this is already _very_ nice, but on an
'extreme' machine like the xt3, one should be able to go a little
further (1 second/MD step ???). i'm not thinking short term here
(as you know, i rarely have time to do anything short term), but
what to do when we get access to true petascale hardware and for
that it seems reasonable to me to first try to evaluate how far
you can push on high-end hardware with the existing software and
some minimal(?) modifications.

cheers,
axel.

> ciao,
> Teo
>
>
>
> > ciao,
> > axel.

Teodoro Laino

unread,

Jul 9, 2007, 1:58:17 PM7/9/07

to cp...@googlegroups.com

just to share my experience:

on our (CSCS) XT3 machine IT people installed a special library (iobuf, vendor libraries from CRAY)..

this library does what you were describing in your e-mail.. is intercepting all IO operations and keeps them in memory and only when the IO buffer is full (~ 100 MB per node) everything is flushed.. in this way all the problems related to the scalability of the code due to IO operations are highly dumped..

Axel

unread,

Jul 9, 2007, 2:11:08 PM7/9/07

to cp2k

On Jul 9, 1:58 pm, Teodoro Laino <teodoro.la...@gmail.com> wrote:
> just to share my experience:
> on our (CSCS) XT3 machine IT people installed a special library
> (iobuf, vendor libraries from CRAY)..
> this library does what you were describing in your e-mail.. is

i was referring exactly to this module/library.

> intercepting all IO operations and keeps them in memory and only when
> the IO buffer is full (~ 100 MB per node) everything is flushed.. in
> this way all the problems related to the scalability of the code due
> to IO operations are highly dumped..

right! it is very useful. i've been using even larger buffers
in some cases. but open/close interferes with it since iobuf
cannot know which close is the last one, so one can only
intercept closing of files that one does not care about
after the run is done. so for all files that one wants to
keep, iobuf is of limited efficiency, since every close implies
a flush and a sync.

as i wrote before, it is of limited relevance now, but i expect
this to become more of a problem in the not so far future.

axel.

Teodoro Laino

unread,

Jul 12, 2007, 6:16:25 PM7/12/07

to cp...@googlegroups.com

>
> so if cp2k cannot do it as it is right now, can you give
> me any hints, on how i could implement this without having
> to read through all of the source code? the way i see it,

It is available now the possibility to dump the IONS+CENTERS stuff in
xyz, dcd and atomic format...
The keyword is DFT%PRINT%LOCALIZATION%IONS+CENTERS (default FALSE..
so that the old format is produced normally)..
If this keyword is switched to .TRUE. the IONS+CENTERS format is
produced instead (particles+wannier centers)..
The keyword FORMAT controls the format of the IONS+CENTERS info..
The keyword FORMAT has no effects if the keyword IONS+CENTERS is set
to FALSE.

regtest: test/QS/regtest-gpw-4/H2O-debug-4.inp is an example of the
above keywords..

Teo

Reply all

Reply to author

Forward