concatenate dcd files, restart, real timestemps

785 views
Skip to first unread message

bjrnfrdnnd

unread,
Dec 13, 2010, 2:43:53 PM12/13/10
to MDnalysis discussion
Hello all,

Is there a way of joining two dcd files?
Suppose I ran a lammps simulation for some time. I look at the
results, and find that I should run this thing
a bit longer. I therefore restart the simulation, effectively creating
a second dcd file.
I would now like to plot timelines from the entire run, consisting of
the concatenation of the two dcd files.
Is there an easy way to do it? Maybe even use some lammps feature that
joins 2 dcd files (I do not know any such featrure...).

My second question is: does MDAnalysis know anything about the real
timestemp used by lammps, and written to the dumb ascii file, for
example? I know how to extract frames and the framenumber, but is
there somewhere
a link to the actual number of timesteps that corresonponds to this
frame?

Thanks for all help in advance!

Oliver Beckstein

unread,
Dec 13, 2010, 4:57:58 PM12/13/10
to mdnalysis-...@googlegroups.com
On 13 Dec, 2010, at 19:43, bjrnfrdnnd wrote:
>
> Is there a way of joining two dcd files?

You can try

from MDAnalysis import *
u = Universe(PSF, ["traj_1.dcd", "traj_2.dcd"])

to load the two trajectories automagically in sequence. At the moment you will only be able to sequentially iterate through it, though. You can analyze the universe u as usual or you can write a new trajectory with (from memory...)

W = Writer("full.dcd", u.atoms.numberOfAtoms())
for ts in u.trajectory:
W.write(u)

Of course, you can also concatenate the dcds with VMD's catdcd tool http://www.ks.uiuc.edu/Development/MDTools/catdcd/ (although one needs to keep in mind that catdcd appears to treat times as ps whereas the CHARMM format is in "AKMA" units. I can't remember what LAMMPS does.)

> Suppose I ran a lammps simulation for some time. I look at the
> results, and find that I should run this thing
> a bit longer. I therefore restart the simulation, effectively creating
> a second dcd file.
> I would now like to plot timelines from the entire run, consisting of
> the concatenation of the two dcd files.
> Is there an easy way to do it?

Either of the two methods above should work.

> Maybe even use some lammps feature that
> joins 2 dcd files (I do not know any such featrure...).
>
> My second question is: does MDAnalysis know anything about the real
> timestemp used by lammps, and written to the dumb ascii file, for
> example? I know how to extract frames and the framenumber, but is
> there somewhere
> a link to the actual number of timesteps that corresonponds to this
> frame?

The dcd header has an entry for that. In MDAnalysis you can get the actual delta_t in ps from

u.trajectory.dt

The native simulation time step is supposed to be u.trajectory.delta and it is in native units (i.e. whatever is stored in the dcd file). dt is calculated as

dt = round(self.skip_timestep * self.convert_time_from_native(self.delta), 4)

where skip_timestep is typically the number of MD steps skipped between writing a frame to disk. Of course, one can also set skip_timestep to 1 and delta to the time between saved frame.

You are free to *change* these numbers before writing the trajectory to disk; this can be useful when, e.g. catdcd messed up the dcd header but you require correct information in there. If you want to change the header of the written trajectory then you supply additional keyword arguments to the Writer; for a dcd you can use following [see "help(MDAnalysis.coordinates.DCD.DCDWriter)"]:

W = Writer("full.dcd", u.atoms.numberOfAtoms(), delta=MDAnalysis.core.units.convert(1.0, 'ps', 'AKMA'), step=1, start=0)

which would write a dcd with a delta of 1 ps and a skip_timestep of 1, starting at frame 0. Note that the actual delta in the dcd file would be 20.45482949774598 AKMA units – unfortunately you have to do this conversion manually at the moment because the DCDReader always assumes that dcds contain AKMA units. (Please file a Issue if you want a different behaviour implemented.)

The timestep itself (either the 'ts' in the loop above or u.trajectory.ts) only contains the frame number itself, u.trajectory.ts.frame. Typically you can get the current time by calculating dt * frame.

Hope this helps.

Oliver

--
Oliver Beckstein * orbe...@gmx.net
skype: orbeckst * orbe...@gmail.com

Bjoern Nadrowski

unread,
Dec 16, 2010, 9:19:15 PM12/16/10
to mdnalysis-...@googlegroups.com
Thanks, that was helpful.
It is, however, not sufficient in order to get the real timestemps.
The missing information would be the timestemp for the first frame.
Is there a way to find out this timestep?
I also noticed that a dcd file that I read in using MDAnalysis resulted
in 2 timeframes. In reality, however, it contained 3 frames. The first
one was not read in, I do not know why.
I confirmed the presence of the 3 frames using pymol and vmd.
Is this a bug or a feature?

Elizabeth Denning

unread,
Dec 17, 2010, 12:03:47 AM12/17/10
to Bjoern Nadrowski, mdnalysis-...@googlegroups.com
That's a bug so please definitely file a bug report so this gets fix
in time for the next release.

Thanks!
~Elizabeth

> --
> You received this message because you are subscribed to the Google
> Groups "MDnalysis discussion" group.
> To post to this group, send email to mdnalysis-...@googlegroups.com
> .
> To unsubscribe from this group, send email to mdnalysis-discus...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/mdnalysis-discussion?hl=en
> .
>

--
Liz

-------------
Elizabeth Denning *denn...@gmail.com

Oliver Beckstein

unread,
Dec 17, 2010, 10:02:37 AM12/17/10
to mdnalysis-...@googlegroups.com
Hi Bjoern,

On 17 Dec, 2010, at 02:19, Bjoern Nadrowski wrote:

> Thanks, that was helpful.
> It is, however, not sufficient in order to get the real timestemps.

I am not sure I understand what you mean. The DCD format only records the framenumber for each frame, not the actual time. The only way I know to get the time of the step is to use the information from the DCD header and compute the time myself as I described in my previous email. (We could add this to the MDAnalysis Timestep object – submit an enhancement request through http://code.google.com/p/mdanalysis/issues/list if you would want to see it.)

> The missing information would be the timestemp for the first frame.
> Is there a way to find out this timestep?

u.trajectory.dt

(if LAMMPS filled in the DCD header appropriately).

If this does not work then can you load the trajectory and report the output of

print u.trajectory._dcd_header()
print u.trajectory.dt

(_dcd_header() is a debugging function which is somewhat fragile and may not work on all architectures.)

For instance, for the example/testing trajectory:

>>> from MDAnalysis.tests.datafiles import PSF,DCD
>>> from MDAnalysis import Universe
>>> u = Universe(PSF,DCD)
>>> print u.trajectory._dcd_header()
{'natoms': 3341, 'reverse': 0, 'with_unitcell': 0, 'istart': 1000, 'nfixed': 0, 'setsread': 98, 'nsets': 98, 'header_size': 356, 'charmm': 1, 'first': 0, 'delta': 0.020454827696084976, 'fixedcoords_ptr': 0, 'file_desc': 4, 'nsavc': 1000, 'freeind_ptr': 0}
>>> print u.trajectory.dt
1.0


> I also noticed that a dcd file that I read in using MDAnalysis resulted
> in 2 timeframes. In reality, however, it contained 3 frames. The first
> one was not read in, I do not know why.
> I confirmed the presence of the 3 frames using pymol and vmd.
> Is this a bug or a feature?

Can you please describe what you did to access the frames?

I get for the test trajectory (which contains 98 confirmed frames)

>>> len(u.trajectory)
98
>>> frame_numbers = [ts.frame for ts in u.trajectory]
>>> print frame_numbers[:6]
[1, 2, 3, 4, 5, 6]
>>> print frame_numbers[-6:]
[93, 94, 95, 96, 97, 98]

Maybe you can get your times in a similar manner:

>>> t0 = u.trajectory.start_timestep / u.trajectory.skip_timestep * u.trajectory.dt
>>> t = [t0 + (ts.frame-1) * u.trajectory.dt for ts in u.trajectory]
>>> print t[:6]
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]

If it still looks as if MDAnalysis is ignoring a frame then definitely submit a bug report. Please include information that allows us to reproduce the problem because right now I don't understand the bug.

Cheers,
Oli

Bjoern Nadrowski

unread,
Dec 17, 2010, 11:52:28 AM12/17/10
to mdnalysis-...@googlegroups.com
Thanks for all the hints, I got it sorted out.
I was wrong, the dcd is being read in completely.
The information I was looking for is in

u.trajectory.numframes,
u.trajectory.dt,
u.trajectory.skip_timestep

You can then build a complete time axis using

ts_timeaxis = scipy.arange(ts.numframes)*ts.dt
+ts.start_timestep*ts.dt/ts.skip_timestep

Note in this context, that u.trajectory.delta is less precise than
u.trajectory.dt/u.trajectory.skip_timestep

In my case, the lammps value was 0.012, but ts.delta gives
0.012000000104308128

Thanks for all your help!

Reply all
Reply to author
Forward
0 new messages