Create ASCII DAT Files For Export to Other Plotting Programs

Christos Efthymiou

unread,

Sep 12, 2022, 4:31:47 AM9/12/22

to MDnalysis discussion

Hello,

I am running many MD simulations that each generate large DCD files that I would like to analyze with MDAnalysis. Ideally, I would like to create a template notebook that includes calculation of the RMSD, RMSF, radius of gyration, H Bond analysis PCA, etc. for a trajectory, so that I can just replace the path to the PSF and DCD file each time I have a new trajectory to analyze. For each of these attributes, I would like for .dat files to be created so that I can download/export them and use them in other plotting programs such as OriginPro. Is this possible?

I appreciate any advice!

Best,

Christos

Hugo Macdermott-Opeskin

unread,

Sep 12, 2022, 4:36:29 AM9/12/22

to mdnalysis-...@googlegroups.com

Hi Christos,

You can run your analysis as planned in MDAnalysis and then access your results using the .timeseries attribute to get your results as a NumPy array. You can then use the np.savetxt function to save your results as a columnar .dat file or as a CSV etc etc. feel free to post code snippets here if you are having trouble and I can give you a hand.

Cheers

Hugo

--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/80c86e99-49ee-4be4-a311-ab04fc1cf6c3n%40googlegroups.com.

--

Hugo MacDermott-Opeskin

Christos Efthymiou

unread,

Sep 12, 2022, 5:04:59 AM9/12/22

to MDnalysis discussion

Hi Hugo,

Thank you for offering some help. Here is some code that I have so far for the radius of gyration as that is all I have figured out to calculate so far:

for ts in u.trajectory[:20]:
time = u.trajectory.time
rgyr = u.atoms.radius_of_gyration()
rgyr_results = rgyr.timeseries
np.savetext('rgyr.dat', rgyr_results, delimiter=' ')

Running this causes the following error:

AttributeError Traceback (most recent call last)
Input In [8], in <cell line: 8>()
9 time = u.trajectory.time
10 rgyr = u.atoms.radius_of_gyration()
---> 11 rgyr_results = rgyr.timeseries
12 np.savetext('rgyr.dat', rgyr_results, delimiter=' ')

AttributeError: 'numpy.float64' object has no attribute 'timeseries'

What have I done wrong?

Thanks,

Christos

Christos Efthymiou

unread,

Sep 12, 2022, 12:10:05 PM9/12/22

to MDnalysis discussion

Just to follow up, I have edited the code to the following:

-----

for ts in u.trajectory[:20]:
time = u.trajectory.time
rgyr = u.atoms.radius_of_gyration()

rgyr_results = np.array(rgyr)
print(rgyr_results)

-----

Which gives the following output:

-----

46.293977798181224
46.28291511989854
46.29337494557188
46.248910816691165
46.20000956913493
46.26555146743779
46.23977494833417
46.27252240880911
46.26930326244436
46.30537571671158
46.235001390294954
46.29165361818403
46.24592737562194
46.262677698603554
46.26064613258189
46.21341804492745
46.274757819186654
46.266838371808404
46.29470594976873
46.25604421245822

----

If I then try to add the following line to the bottom of the code:

-----

np.savetxt('rgyr.dat', rgyr_results, delimiter = ' ')

-----

I get this error:

-----

ValueError Traceback (most recent call last)
Input In [21], in <cell line: 14>()
20 print(rgyr_results)
21 # rgyr_results = rgyr.timeseries
---> 22 np.savetxt('rgyr.dat', rgyr_results, delimiter = ' ')

File <__array_function__ internals>:180, in savetxt(*args, **kwargs)

File ~\anaconda3\lib\site-packages\numpy\lib\npyio.py:1397, in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
1395 # Handle 1-dimensional arrays
1396 if X.ndim == 0 or X.ndim > 2:
-> 1397 raise ValueError(
1398 "Expected 1D or 2D array, got %dD array instead" % X.ndim)
1399 elif X.ndim == 1:
1400 # Common case -- 1d array of numbers
1401 if X.dtype.names is None:

ValueError: Expected 1D or 2D array, got 0D array instead

----

I will keep working at this to solve the issue, but please let me know if you notice any errors.

Christos Efthymiou

unread,

Sep 12, 2022, 3:22:41 PM9/12/22

to MDnalysis discussion

Here is the correct code:

----

rgyr = []
time = []
protein = u.select_atoms("protein")
for ts in u.trajectory[:10]:
time.append(u.trajectory.time)
rgyr.append(protein.radius_of_gyration())

time_rgyr = np.column_stack((time,rgyr))
rgyr_results = np.array(time_rgyr)
#print(rgyr_results)
np.savetxt('rgyr.dat', rgyr_results)

----

If you want it to work for every frame in your trajectory, remove the [:10].

Hugo Macdermott-Opeskin

unread,

Sep 13, 2022, 9:58:14 PM9/13/22

to MDnalysis discussion

Awesome work Christos,

Glad you got it working.

Let us know if you have any more questions.

Cheers

Hugo

Christos Efthymiou

unread,

Sep 24, 2022, 1:00:54 AM9/24/22

to MDnalysis discussion

Hi Hugo,

I have two additional questions.

First, how can I set the stride for a calculation? For example, this is the code I have for calculating the RMSD over the course of my trajectory:

````

R = rms.RMSD(u, # universe to align
u, # reference universe or atomgroup
select='backbone', # group to superimpose and calculate RMSD
#groupselections=[CORE, LID, NMP], # if you want to calculate the RMSD for certain regions of the protein,
#you can set variables like CORE = resi 80-200, and then the RMSD of that section will be calculated
#separately and be included as a column in the R.rmsd result
ref_frame=0) # frame index of the reference
R.run()
R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)
np.savetxt('rmsd.dat', rmsd_results)

````

Where can I set the stride and how?

Second, I am having some trouble calculating the RMSF for my trajectory. Here is the code that I have:

````

R = rms.RMSD(u, # universe to align
u, # reference universe or atomgroup
select='backbone', # group to superimpose and calculate RMSD
#groupselections=[CORE, LID, NMP], # if you want to calculate the RMSD for certain regions of the protein,
#you can set variables like CORE = resi 80-200, and then the RMSD of that section will be calculated
#separately and be included as a column in the R.rmsd result
ref_frame=0) # frame index of the reference
R.run()
R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)
np.savetxt('rmsd.dat', rmsd_results)

````

When I run that code, I get this error: DeprecationWarning: The `universe` attribute was deprecated in MDAnalysis 2.0.0 and will be removed in MDAnalysis 3.0.0. Please use `results.universe` instead. warnings.warn(wmsg, DeprecationWarning).

An output is still generated, but it does not have a column with the residue numbers. Is it possible to modify the above code to include the residue numbers from the PSF file as the first column?

Thank you!

Hugo Macdermott-Opeskin

unread,

Sep 25, 2022, 6:52:44 AM9/25/22

to MDnalysis discussion

Hi Christos,

You can use the start, stop step arguments to `run()`, see this paragraph coped from the userguide

"""Run the analysis with .run(). Optional arguments are the start frame index, stop frame index, step size, and toggling verbose. The default is to run analysis on the whole trajectory""" (https://userguide.mdanalysis.org/stable/examples/quickstart.html).

RE the RMSF, I think you forgot to include the RMSF code and used RMSD code instead? I think I can still answer your questions though.

The deprecation warning is mostly harmless and indicates that something will be removed in the future.

To get the resids, use the .resid attribute of your selection, for example, with "backbone" you could do:

u.select_atoms("backbone").resids to get the resids as a numpy array.

To save it to a two column file you would need to concatenate your two arrays using something like np.vstack(resids,rmsf ) before saving to text.

Cheers

Hugo

Christos Efthymiou

unread,

Oct 4, 2022, 1:28:17 PM10/4/22

to MDnalysis discussion

Hi Hugo,

Thank you for the help. If I want to set the step only, how would I do this? I tried using R.run(step=1000), but that did not work. I cannot find anything in the documentation for how to do this, so I appreciate your insight!

Best,

Christos

Hugo Macdermott-Opeskin

unread,

Oct 4, 2022, 6:23:24 PM10/4/22

to mdnalysis-...@googlegroups.com

Hi Christos,

try setting the start to 0, the end to the number of frames and the step to the step you want.

Cheers

Hugo

You received this message because you are subscribed to a topic in the Google Groups "MDnalysis discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mdnalysis-discussion/Kr3VecKpz04/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/76aa03fe-1c65-433e-88eb-61e21763d72an%40googlegroups.com.

Christos Efthymiou

unread,

Oct 6, 2022, 6:38:38 PM10/6/22

to MDnalysis discussion

Hi Hugo,

I am still having some trouble. Here is my code:

````

import MDAnalysis as mda
import numpy as np
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA

u = mda.Universe("Wildtype_100ns_Test1_QwikMD.psf", "MD.dcd")

hbonds = HBA(universe=u)
hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein")
hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
hbonds.run(0,-1,10)

print(hbonds.results.hbonds)
np.savetxt('hbonds_protein.dat', hbonds.results.hbonds)
#hbonds_results = np.array(HBA.hbonds)
#print(hbonds_results)
#np.savetxt('hbonds_protein.dat', hbonds_results)

````

And this is the error I get:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

OverflowError Traceback (most recent call last)
Input In [11], in <cell line: 11>()
9 hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein")
10 hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
---> 11 hbonds.run(0,-1,10)
13 print(hbonds.results.hbonds)
14 np.savetxt('hbonds_protein.dat', hbonds.results.hbonds)

File ~\anaconda3\lib\site-packages\MDAnalysis\analysis\base.py:295, in AnalysisBase.run(self, start, stop, step, verbose)
293 logger.info("Starting preparation")
294 self._prepare()
--> 295 for i, ts in enumerate(ProgressBar(
296 self._trajectory[self.start:self.stop:self.step],
297 verbose=verbose)):
298 self._frame_index = i
299 self._ts = ts

File ~\anaconda3\lib\site-packages\tqdm\notebook.py:258, in tqdm_notebook.__iter__(self)
256 try:
257 it = super(tqdm_notebook, self).__iter__()
--> 258 for obj in it:
259 # return super(tqdm...) will not catch exception
260 yield obj
261 # NB: except ... [ as ...] breaks IPython async KeyboardInterrupt

File ~\anaconda3\lib\site-packages\tqdm\std.py:1183, in tqdm.__iter__(self)
1180 # If the bar is disabled, then just walk the iterable
1181 # (note: keep this check outside the loop for performance)
1182 if self.disable:
-> 1183 for obj in iterable:
1184 yield obj
1185 return

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:994, in FrameIteratorSliced.__iter__(self)
992 def __iter__(self):
993 for i in range(self.start, self.stop, self.step):
--> 994 yield self.trajectory[i]
995 self.trajectory.rewind()

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:1610, in ProtoReader.__getitem__(self, frame)
1608 if isinstance(frame, numbers.Integral):
1609 frame = self._apply_limits(frame)
-> 1610 return self._read_frame_with_aux(frame)
1611 elif isinstance(frame, (list, np.ndarray)):
1612 if len(frame) != 0 and isinstance(frame[0], (bool, np.bool_)):
1613 # Avoid having list of bools

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:1642, in ProtoReader._read_frame_with_aux(self, frame)
1640 def _read_frame_with_aux(self, frame):
1641 """Move to *frame*, updating ts with trajectory, transformations and auxiliary data."""
-> 1642 ts = self._read_frame(frame) # pylint: disable=assignment-from-no-return
1643 for aux in self.aux_list:
1644 ts = self._auxs[aux].update_ts(ts)

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\DCD.py:182, in DCDReader._read_frame(self, i)
180 """read frame i"""
181 self._frame = i - 1
--> 182 self._file.seek(i)
183 return self._read_next_timestep()

File MDAnalysis\lib\formats\libdcd.pyx:427, in MDAnalysis.lib.formats.libdcd.DCDFile.seek()

OverflowError: Python int too large to convert to C long

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

How can I fix this?

Hugo Macdermott-Opeskin

unread,

Oct 6, 2022, 6:56:47 PM10/6/22

to MDnalysis discussion

Hi Christos,

This looks like a low level error and is likely not your fault.
How many frames in your trajectory? It looks to me like you are overflowing the size of C int32, (2147483647)

Christos Efthymiou

unread,

Oct 6, 2022, 7:02:12 PM10/6/22

to MDnalysis discussion

Hi Hugo,

There are 50,000 frames in my trajectory, it is only a 100ns simulation. I don't understand why setting a step size would cause a problem. I have been able to run RMSD analysis on all the frames, so shouldn't a step size make it even easier to run analysis?

Hugo Macdermott-Opeskin

unread,

Oct 7, 2022, 8:25:46 PM10/7/22

to MDnalysis discussion

Would you be able to attatch your trajectory so I can try and reproduce the error?

Cheers

Hugo

Christos Efthymiou

unread,

Oct 10, 2022, 6:23:36 PM10/10/22

to MDnalysis discussion

Hi Hugo,

Unfortunately my trajectory is too large to be attached. It is 45 GB. Is there any other way to speed up the calculation process other than setting a step size? I was using VMD for smaller trajectory files (20 GB), and the calculations for things like RMSD and RMSF were performed quite rapidly (minutes). MD Analysis is taking several hours to perform just the RMSD analysis for one trajectory, so I am unable to complete my analysis in an efficient way.

Best,

Christos

Irfan Alibay

unread,

Oct 10, 2022, 6:34:58 PM10/10/22

to MDnalysis discussion

Hi Christos,

By any chance are you using a Windows machine?

This unfortunately sounds like another case of this error: https://github.com/MDAnalysis/mdanalysis/issues/3075

> I was using VMD for smaller trajectory files (20 GB), and the calculations for things like RMSD and RMSF were performed quite rapidly (minutes). MDAnalysis is taking several hours to perform just the RMSD analysis for one trajectory, so I am unable to complete my analysis in an efficient way.

This doesn't seem right, here do you mean that when using the 20 GB file with MDAnalysis it is significantly slower than VMD? Are you also accounting for RMS alignment as part of the VMD timing here?

Best,

Irfan

Naveen Michaud-Agrawal

unread,

Oct 10, 2022, 7:51:45 PM10/10/22

to mdnalysis-...@googlegroups.com

Hi Christos,

Are you using a 64 bit version of python?

Regards,

Naveen

To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/0dcdb968-86ef-48e7-bf95-1419ba19541an%40googlegroups.com.

Christos Efthymiou

unread,

Oct 11, 2022, 12:45:34 PM10/11/22

to MDnalysis discussion

Hi,

Yes I am using a Windows machine. I see a workaround was provided for that specific scenario, but I am not sure how to apply it to my case. For example, here is my code for RMSD:

#Calculate RMSD
from MDAnalysis.analysis import rms

u = mda.Universe("R:/Wildtype_100ns_Test2/Wildtype_100ns_Test2/run/Wildtype_100ns_Test2_QwikMD.psf", "R:/Wildtype_100ns_Test2/Wildtype_100ns_Test2/run/MD.dcd")

R = rms.RMSD(u, # universe to align
u, # reference universe or atomgroup
select='backbone', # group to superimpose and calculate RMSD
#groupselections=[CORE, LID, NMP],

ref_frame=0) # frame index of the reference

R.run(0,-1,10)

R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)

np.savetxt('rmsd_10stride.dat', rmsd_results)

How can I implement the given workaround in this scenario?

Yes, I am not sure why but MD Analysis seems to take exponentially longer than VMD. For example, I just ran an RMSD analysis with MD Analysis and it took 14 hours to complete. I am not sure what the RMS alignment means timing-wise for VMD. In VMD I would load a trajectory (~15 minutes) and then the RMSD would be calculated within about 2 minutes.

Best,

Christos

Christos Efthymiou

unread,

Oct 11, 2022, 12:46:43 PM10/11/22

to MDnalysis discussion

Hi Naveen,

Yes, I believe it is a 64 bit version. The output of this cell

import sys
print(sys.maxsize)

was 9223372036854775807.

Best,

Christos

Reply all

Reply to author

Forward