Outputting cell information and fractional coordinates

1172 views
Skip to first unread message

fxco...@gmail.com

unread,
Dec 11, 2013, 7:27:20 AM12/11/13
to cp...@googlegroups.com
Hi all,

I'm running QuickStep MD of triclinic unit cells, and dealing with the trajectory is not effortless, to say the least. I have to piece together the Cartesian coordinates from a .xyz files (MOTION / PRINT / TRAJECTORY) and the cell info (MOTION / PRINT / CELL). Is there any way to print directly fractional coordinates in the trajectory file, with cell information above (like a modified XYZ format). DCD is an option, but impractical since it doesn't contain labels…

Or does someone have an existing post-processing script for this task?

Cheers,
FX

Matthias Krack

unread,
Dec 11, 2013, 8:24:07 AM12/11/13
to cp...@googlegroups.com
Hi FX,

I could provide a dumpdcd code for CP2K output files in DCD format. Atomic labels are added if a prototype XYZ file is provided. It has also some additional features like frame selection (first/last dumped frame, stride), an optional application of PBC boundary conditions and a dump of all out-of-box atoms. I use often the DCD format, because it allows the consideration of the cell information e.g. VMD. I would put the code into the cp2k/tools folder in case you are interested.

Matthias

Teodoro Laino

unread,
Dec 11, 2013, 8:37:02 AM12/11/13
to cp...@googlegroups.com
Or alternatively use directly VMD. First load an xyz and then “load in molecule” the DCD.

That’s pretty fast and not problematic.
Teo

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
Visit this group at http://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/groups/opt_out.

Teodoro Laino

unread,
Dec 11, 2013, 8:40:30 AM12/11/13
to cp...@googlegroups.com
and to reply to your question: not really. 
DCD is an option.
Alternatively you may want to try PDB. It contains cell information + label but not scaled coordinates (again, PDB is easily digested by VMD).

Best,
Teo

On 11 Dec 2013, at 13:27, fxco...@gmail.com wrote:

fxco...@gmail.com

unread,
Jan 29, 2014, 4:51:04 PM1/29/14
to cp...@googlegroups.com
Hi all,

I'm going back to this issue of outputting cell information.

I could provide a dumpdcd code for CP2K output files in DCD format. Atomic labels are added if a prototype XYZ file is provided. It has also some additional features like frame selection (first/last dumped frame, stride), an optional application of PBC boundary conditions and a dump of all out-of-box atoms. I use often the DCD format, because it allows the consideration of the cell information e.g. VMD. I would put the code into the cp2k/tools folder in case you are interested.

Yes, I know how to use DCD format, but I'm not so sure it contains all the information from the simulation. Only the (a,b,c,alpha,beta,gamma) information is written out, but not the actual unit cell matrix (h). Which means, unless a alignment is chosen and coordinates are rotated before being written to the DCD file (and from reading the source in particle_types.F, it doesn't look the case), some information is lost and there is no way to obtain fractional coordinates from the existing output format.

Or am I missing something?

FX

Matthias Krack

unread,
Jan 30, 2014, 4:07:34 AM1/30/14
to cp...@googlegroups.com
Hi FX,

you are right, the information about the alignment of the cell with respect to the Cartesian axes is lost. Thus it is currently not possible to retrieve the scaled coordinates for an atomic configuration in an unique manner from the dcd file information involving non-orthorhombic cells. This is however something which could be fixed. CP2K adopts a common convention for the cell alignment when an explicit cell vector definition is not available, namely, the cell vector a is aligned with the x axis and the cell vectors a and b lie in the xy plane. I think this convention could also be applied in this case and the Cartesian coordinates could be dumped only after the cell has been mapped to this representation using scaled coordinates.

Matthias

FX

unread,
Jan 30, 2014, 4:13:29 AM1/30/14
to cp...@googlegroups.com
you are right, the information about the alignment of the cell with respect to the Cartesian axes is lost. Thus it is currently not possible to retrieve the scaled coordinates for an atomic configuration in an unique manner from the dcd file information involving non-orthorhombic cells. This is however something which could be fixed. CP2K adopts a common convention for the cell alignment when an explicit cell vector definition is not available, namely, the cell vector a is aligned with the x axis and the cell vectors a and b lie in the xy plane. I think this convention could also be applied in this case and the Cartesian coordinates could be dumped only after the cell has been mapped to this representation using scaled coordinates.

Either that or allowing to dump into a format with support for non-aligned unit cell (like xtc: http://manual.gromacs.org/online/xtc.html). Otherwise, people currently using DCD files might get annoyed that the unit cell is constrained along an axis, which makes it not very suitable for visualization (it looks like it’s jumping all around).

In my particular case, since my simulations have already run, I’m writing a short program to reconstruct fractional coordinates from the .cell file and the .xyz trajectory.

FX

Matthias Krack

unread,
Jan 31, 2014, 9:59:27 AM1/31/14
to cp...@googlegroups.com
There is now an alternative dump format called DCD_ALIGNED_CELL which dumps the coordinates based on a cell alignment as described which should allow a reconstruction of the scaled coordinates only from the DCD dump data (to single precision accuracy).

Matthias

FX

unread,
Jan 31, 2014, 10:34:37 AM1/31/14
to cp...@googlegroups.com
> There is now an alternative dump format called DCD_ALIGNED_CELL which dumps the coordinates based on a cell alignment as described which should allow a reconstruction of the scaled coordinates only from the DCD dump data (to single precision accuracy).

Thanks!

FX
Message has been deleted

Efrem Braun

unread,
Aug 2, 2017, 4:39:51 AM8/2/17
to cp2k
Would someone be willing to explain this problem in some more detail? I'd like to convert the last snapshot of my pdb trajectory into a cif file containing fractional coordinates. I was planning to do this using obabel to get the cif file containing Cartesian coordinates, and then using obabel again to get a fractional coordinate file. However, I believe that obabel does the second conversion assuming that crytallographic conventions regarding the unit cell (https://en.wikipedia.org/wiki/Fractional_coordinates#In_Crystallography) are followed, and per this thread, I'm not sure whether this is the case.

Efrem Braun

Efrem Braun

unread,
Aug 2, 2017, 7:43:35 AM8/2/17
to cp2k
Similar question applies when using pwtools. io.write_cif is happy to write a cif file in fractional coordinates given a CP2K output pdb, but is it assuming that CP2K was writing the pdb using a convention that CP2K isn't using?

Efrem Braun

unread,
Aug 2, 2017, 4:05:41 PM8/2/17
to cp2k
I'll note that I've since realized that an easy and safe option for converting the final snapshot into a cif file containing fractional coordinates is to post-process the restart file (which contains the full unit cell matrix) into a file type that uses this kind of information, such as the VASP POSCAR file format, and then use obabel to convert that file into a cif. I'm still interested in hearing the answer to the question, but it's less pertinent to my own work. Sorry for the excess emails!

Steve Schmerler

unread,
Aug 26, 2017, 1:10:47 PM8/26/17
to cp2k
On Aug 02 04:43 -0700, Efrem Braun wrote:
> Similar question applies when using pwtools. io.write_cif is happy to write
> a cif file in fractional coordinates given a CP2K output pdb, but is it
> assuming that CP2K was writing the pdb using a convention that CP2K isn't
> using?

Hi Efrem

Sorry for being late on this topic. I just stumbled across your mail
mentioning pwtools, so I'd like to answer your question.

pwtools.io.read_pdb() uses pwtools.parse.PDBFile [1] which is a *very
very* basic parser that only extracts cartesian coordinates (ATOM
record), cell parameters (CRYST1 record) and atom symbols. It doesn't
care where the PDB file comes from, so it doesn't assume any
CP2K-specific information [*]. I haven't used CP2K's PDB output, so I
don't know if there is any additional information that may get ignored.

pwtools calculates the cell matrix using the same convention [2] that
Matthias mentioned in this thread some time ago (a along x, b in x-y
plane). Therefore the cell matrix of the Structure object (st.cell from
st=pwtools.io.read_pdb('foo.pdb')) is aligned as described. For
instance:

In [9]: from pwtools import io as pwio
In [10]: st=pwio.read_pdb('pwtools/test/files/pdb_struct.pdb')
In [11]: st.cryst_const
Out[11]: array([ 10.678, 10.678, 10.678, 90. , 90. , 90. ])
In [12]: st.cell
Out[12]:
array([[ 1.06780000e+01, 0.00000000e+00, 0.00000000e+00],
[ 6.53838926e-16, 1.06780000e+01, 0.00000000e+00],
[ 6.53838926e-16, 6.53838926e-16, 1.06780000e+01]])


If you have only cell parameters a,b,c,alpha,beta,gamma as in PDB or CIF
files, then there is no other way of calculating the cell, i.e. you need
to specify an orientation of the cell. This is why you get in trouble if
you only have cartesian atom coordinates and cell parameters available
and if the cartesian coordinates don't match (are not aligned with) the
cell calculated using the above convention (or any convention for that
matter). In that case, you are out of luck. In particular, the PDB
parser has no way of inferring whether the cartesian coordinates are
aligned.

If using DCD, you can use DCD_ALIGNED_CELL in a new calculation as
mentioned. What you can also do is let CP2K write .cell text files
which, IIRC, contain the cell vectors in Angstrom. Along with the XYZ
files of cartesian atom coordinates, you can then calculate fractional
coordinates from that. See [3] for how this is done in pwtools. If you
find any errors, please let me know! :) In general, I'd write fractional
coordinates whenever possible and avoid formats such as PDB. However,
last time I checked, CP2K has no option to write fractional coordinates
(I may be wrong, I haven't used it for a while).

[1] https://github.com/elcorto/pwtools/blob/master/parse.py#L441
[2] https://github.com/elcorto/pwtools/blob/master/crys.py#L190
[3] http://elcorto.github.io/pwtools/written/background/coord_trans.html

[*] pwtools is in no way affiliated with the CP2K project, even though it
can parse CP2K output and is linked from the CP2K wiki :)

best,
Steve

Efrem Braun

unread,
Aug 26, 2017, 3:27:39 PM8/26/17
to cp2k
Thanks Steve. It's very peculiar to me that CP2K would write out the PDB files without aligning the axes as per the normal convention; I think there's probably a lot of people that are doing what I did, i.e., assuming that the convention is being followed and that the output can be converted to other file formats using tools such as openbabel and pwtools. I'm sure the CP2K developers have a good reason for having chosen to do this, such as FX's point that it makes for better visualization. In that case, it might be a good idea to include a sentence in the documentation that the printed trajectory is for visualization purposes only. Otherwise, one needs to hope that the user will have read this thread to know how to use the output properly (I only knew about this because a CP2K user in my group expressly warned me against using the printed trajectory and pointed me to this thread).

Steve Schmerler

unread,
Aug 26, 2017, 5:32:01 PM8/26/17
to cp2k
On Aug 26 12:27 -0700, Efrem Braun wrote:
> Thanks Steve. It's very peculiar to me that CP2K would write out the PDB
> files without aligning the axes as per the normal convention; I think
> there's probably a lot of people that are doing what I did, i.e., assuming
> that the convention is being followed and that the output can be converted
> to other file formats using tools such as openbabel and pwtools. I'm sure
> the CP2K developers have a good reason for having chosen to do this, such
> as FX's point that it makes for better visualization. In that case, it
> might be a good idea to include a sentence in the documentation that the
> printed trajectory is for visualization purposes only. Otherwise, one needs
> to hope that the user will have read this thread to know how to use the
> output properly (I only knew about this because a CP2K user in my group
> expressly warned me against using the printed trajectory and pointed me to
> this thread).

I did a lot of NPT MD and had to take care of cell information. As I
mentioned, what I did was to have CP2K write the cell *vectors*

&global
run_type md
print_level low
&end global

&motion
&md
...
&end md
&print
...
&cell
&end cell
&end print
&end motion

which produces a file PROJECT-1.cell. Then I used pwtools to parse the
trajectory [1].

>>> tr = io.read_cp2k_md('cp2k.out')

The parser reads the cell file and calculates fractional coordinates,
among other things. That Trajectory object can be sliced like Python
lists, such that

>>> st = tr[-1]

is your last structure (a Structure object) which you can visualize
[2] or transform to other formats. Works also for relaxations runs.

best,
Steve

[1] http://elcorto.github.io/pwtools/written/tutorial.html#parse-md-output-plot-stuff
[2] http://elcorto.github.io/pwtools/written/tutorial.html#view-a-structure-or-trajectory
Reply all
Reply to author
Forward
0 new messages