Create ASCII DAT Files For Export to Other Plotting Programs

44 views
Skip to first unread message

Christos Efthymiou

unread,
Sep 12, 2022, 4:31:47 AMSep 12
to MDnalysis discussion
Hello, 

I am running many MD simulations that each generate large DCD files that I would like to analyze with MDAnalysis. Ideally, I would like to create a template notebook that includes calculation of the RMSD, RMSF, radius of gyration, H Bond analysis PCA, etc. for a trajectory, so that I can just replace the path to the PSF and DCD file each time I have a new trajectory to analyze. For each of these attributes, I would like for .dat files to be created so that I can download/export them and use them in other plotting programs such as OriginPro. Is this possible? 

I appreciate any advice! 

Best, 
Christos 

Hugo Macdermott-Opeskin

unread,
Sep 12, 2022, 4:36:29 AMSep 12
to mdnalysis-...@googlegroups.com
Hi Christos, 

You can run your analysis as planned in MDAnalysis and then access your results using  the .timeseries attribute to get your results as a NumPy array. You can then use the np.savetxt function to save your results as a columnar .dat file or as a CSV etc etc. feel free to post code snippets here if you are having trouble and I can give you a hand. 

Cheers

Hugo 

--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/80c86e99-49ee-4be4-a311-ab04fc1cf6c3n%40googlegroups.com.
--
Hugo MacDermott-Opeskin

Christos Efthymiou

unread,
Sep 12, 2022, 5:04:59 AMSep 12
to MDnalysis discussion
Hi Hugo, 

Thank you for offering some help. Here is some code that I have so far for the radius of gyration as that is all I have figured out to calculate so far: 

for ts in u.trajectory[:20]:
    time = u.trajectory.time
    rgyr = u.atoms.radius_of_gyration()
    rgyr_results = rgyr.timeseries
    np.savetext('rgyr.dat', rgyr_results, delimiter=' ')

Running this causes the following error: 

AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 8>()
      9 time = u.trajectory.time
     10 rgyr = u.atoms.radius_of_gyration()
---> 11 rgyr_results = rgyr.timeseries
     12 np.savetext('rgyr.dat', rgyr_results, delimiter=' ')

AttributeError: 'numpy.float64' object has no attribute 'timeseries'

What have I done wrong? 

Thanks, 
Christos 

Christos Efthymiou

unread,
Sep 12, 2022, 12:10:05 PMSep 12
to MDnalysis discussion
Just to follow up, I have edited the code to the following: 

-----
for ts in u.trajectory[:20]:
    time = u.trajectory.time
    rgyr = u.atoms.radius_of_gyration()
    rgyr_results = np.array(rgyr)
    print(rgyr_results)
-----

Which gives the following output: 

-----
46.293977798181224
46.28291511989854
46.29337494557188
46.248910816691165
46.20000956913493
46.26555146743779
46.23977494833417
46.27252240880911
46.26930326244436
46.30537571671158
46.235001390294954
46.29165361818403
46.24592737562194
46.262677698603554
46.26064613258189
46.21341804492745
46.274757819186654
46.266838371808404
46.29470594976873
46.25604421245822
----

If I then try to add the following line to the bottom of the code:

-----
np.savetxt('rgyr.dat', rgyr_results, delimiter = ' ')
-----

I get this error: 

-----
ValueError                                Traceback (most recent call last)
Input In [21], in <cell line: 14>()
     20     print(rgyr_results)
     21 #    rgyr_results = rgyr.timeseries
---> 22     np.savetxt('rgyr.dat', rgyr_results, delimiter = ' ')

File <__array_function__ internals>:180, in savetxt(*args, **kwargs)

File ~\anaconda3\lib\site-packages\numpy\lib\npyio.py:1397, in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
   1395 # Handle 1-dimensional arrays
   1396 if X.ndim == 0 or X.ndim > 2:
-> 1397     raise ValueError(
   1398         "Expected 1D or 2D array, got %dD array instead" % X.ndim)
   1399 elif X.ndim == 1:
   1400     # Common case -- 1d array of numbers
   1401     if X.dtype.names is None:

ValueError: Expected 1D or 2D array, got 0D array instead
----

I will keep working at this to solve the issue, but please let me know if you notice any errors. 

Christos Efthymiou

unread,
Sep 12, 2022, 3:22:41 PMSep 12
to MDnalysis discussion
Here is the correct code: 

----
rgyr = []
time = []
protein = u.select_atoms("protein")
for ts in u.trajectory[:10]:
    time.append(u.trajectory.time)
    rgyr.append(protein.radius_of_gyration())
   
time_rgyr = np.column_stack((time,rgyr))
rgyr_results = np.array(time_rgyr)
#print(rgyr_results)
np.savetxt('rgyr.dat', rgyr_results)
----

If you want it to work for every frame in your trajectory, remove the [:10]. 



Hugo Macdermott-Opeskin

unread,
Sep 13, 2022, 9:58:14 PMSep 13
to MDnalysis discussion
Awesome work Christos, 
Glad you got it working. 
Let us know if you have any more questions.

Cheers

Hugo 

Christos Efthymiou

unread,
Sep 24, 2022, 1:00:54 AMSep 24
to MDnalysis discussion
Hi Hugo, 

I have two additional questions. 

First, how can I set the stride for a calculation? For example, this is the code I have for calculating the RMSD over the course of my trajectory: 

````
R = rms.RMSD(u,  # universe to align
             u,  # reference universe or atomgroup
             select='backbone',  # group to superimpose and calculate RMSD
             #groupselections=[CORE, LID, NMP],  # if you want to calculate the RMSD for certain regions of the protein,
             #you can set variables like CORE = resi 80-200, and then the RMSD of that section will be calculated
             #separately and be included as a column in the R.rmsd result
             ref_frame=0)  # frame index of the reference
R.run()
R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)
np.savetxt('rmsd.dat', rmsd_results)
````
Where can I set the stride and how? 

Second, I am having some trouble calculating the RMSF for my trajectory. Here is the code that I have: 

````
R = rms.RMSD(u,  # universe to align
             u,  # reference universe or atomgroup
             select='backbone',  # group to superimpose and calculate RMSD
             #groupselections=[CORE, LID, NMP],  # if you want to calculate the RMSD for certain regions of the protein,
             #you can set variables like CORE = resi 80-200, and then the RMSD of that section will be calculated
             #separately and be included as a column in the R.rmsd result
             ref_frame=0)  # frame index of the reference
R.run()
R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)
np.savetxt('rmsd.dat', rmsd_results)
````

When I run that code, I get this error: DeprecationWarning: The `universe` attribute was deprecated in MDAnalysis 2.0.0 and will be removed in MDAnalysis 3.0.0. Please use `results.universe` instead. warnings.warn(wmsg, DeprecationWarning). 

An output is still generated, but it does not have a column with the residue numbers. Is it possible to modify the above code to include the residue numbers from the PSF file as the first column? 

Thank you! 

Hugo Macdermott-Opeskin

unread,
Sep 25, 2022, 6:52:44 AMSep 25
to MDnalysis discussion
Hi Christos, 

You can use the start, stop step arguments to `run()`, see this paragraph coped from the userguide 

"""Run the analysis with .run(). Optional arguments are the start frame index, stop frame index, step size, and toggling verbose. The default is to run analysis on the whole trajectory""" (https://userguide.mdanalysis.org/stable/examples/quickstart.html). 


RE the RMSF,  I think you forgot to include the RMSF code and used RMSD code instead? I think I can still answer your questions though.

The deprecation warning is mostly harmless and indicates that something will be removed in the future. 

To get the resids, use the .resid attribute of your  selection, for example, with "backbone" you could do:

u.select_atoms("backbone").resids to get the resids as a numpy array.

To save it to a two column file you would need to concatenate your two arrays using something like np.vstack(resids,rmsf ) before saving to text.

Cheers

Hugo

Christos Efthymiou

unread,
Oct 4, 2022, 1:28:17 PMOct 4
to MDnalysis discussion
Hi Hugo, 

Thank you for the help. If I want to set the step only, how would I do this? I tried using R.run(step=1000), but that did not work. I cannot find anything in the documentation for how to do this, so I appreciate your insight! 

Best, 
Christos 

Hugo Macdermott-Opeskin

unread,
Oct 4, 2022, 6:23:24 PMOct 4
to mdnalysis-...@googlegroups.com
Hi Christos, 
try setting the start to 0, the end to the number of frames and the step to the step you want. 
Cheers

Hugo

You received this message because you are subscribed to a topic in the Google Groups "MDnalysis discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mdnalysis-discussion/Kr3VecKpz04/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/76aa03fe-1c65-433e-88eb-61e21763d72an%40googlegroups.com.

Christos Efthymiou

unread,
Oct 6, 2022, 6:38:38 PMOct 6
to MDnalysis discussion
Hi Hugo, 

I am still having some trouble. Here is my code: 

````
import MDAnalysis as mda
import numpy as np
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA

u = mda.Universe("Wildtype_100ns_Test1_QwikMD.psf", "MD.dcd")

hbonds = HBA(universe=u)
hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein")
hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
hbonds.run(0,-1,10)

print(hbonds.results.hbonds)
np.savetxt('hbonds_protein.dat', hbonds.results.hbonds)
#hbonds_results = np.array(HBA.hbonds)
#print(hbonds_results)
#np.savetxt('hbonds_protein.dat', hbonds_results)
````

And this is the error I get: 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Input In [11], in <cell line: 11>()
      9 hbonds.hydrogens_sel = hbonds.guess_hydrogens("protein")
     10 hbonds.acceptors_sel = hbonds.guess_acceptors("protein")
---> 11 hbonds.run(0,-1,10)
     13 print(hbonds.results.hbonds)
     14 np.savetxt('hbonds_protein.dat', hbonds.results.hbonds)

File ~\anaconda3\lib\site-packages\MDAnalysis\analysis\base.py:295, in AnalysisBase.run(self, start, stop, step, verbose)
    293 logger.info("Starting preparation")
    294 self._prepare()
--> 295 for i, ts in enumerate(ProgressBar(
    296         self._trajectory[self.start:self.stop:self.step],
    297         verbose=verbose)):
    298     self._frame_index = i
    299     self._ts = ts

File ~\anaconda3\lib\site-packages\tqdm\notebook.py:258, in tqdm_notebook.__iter__(self)
    256 try:
    257     it = super(tqdm_notebook, self).__iter__()
--> 258     for obj in it:
    259         # return super(tqdm...) will not catch exception
    260         yield obj
    261 # NB: except ... [ as ...] breaks IPython async KeyboardInterrupt

File ~\anaconda3\lib\site-packages\tqdm\std.py:1183, in tqdm.__iter__(self)
   1180 # If the bar is disabled, then just walk the iterable
   1181 # (note: keep this check outside the loop for performance)
   1182 if self.disable:
-> 1183     for obj in iterable:
   1184         yield obj
   1185     return

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:994, in FrameIteratorSliced.__iter__(self)
    992 def __iter__(self):
    993     for i in range(self.start, self.stop, self.step):
--> 994         yield self.trajectory[i]
    995     self.trajectory.rewind()

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:1610, in ProtoReader.__getitem__(self, frame)
   1608 if isinstance(frame, numbers.Integral):
   1609     frame = self._apply_limits(frame)
-> 1610     return self._read_frame_with_aux(frame)
   1611 elif isinstance(frame, (list, np.ndarray)):
   1612     if len(frame) != 0 and isinstance(frame[0], (bool, np.bool_)):
   1613         # Avoid having list of bools

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\base.py:1642, in ProtoReader._read_frame_with_aux(self, frame)
   1640 def _read_frame_with_aux(self, frame):
   1641     """Move to *frame*, updating ts with trajectory, transformations and auxiliary data."""
-> 1642     ts = self._read_frame(frame)  # pylint: disable=assignment-from-no-return
   1643     for aux in self.aux_list:
   1644         ts = self._auxs[aux].update_ts(ts)

File ~\anaconda3\lib\site-packages\MDAnalysis\coordinates\DCD.py:182, in DCDReader._read_frame(self, i)
    180 """read frame i"""
    181 self._frame = i - 1
--> 182 self._file.seek(i)
    183 return self._read_next_timestep()

File MDAnalysis\lib\formats\libdcd.pyx:427, in MDAnalysis.lib.formats.libdcd.DCDFile.seek()

OverflowError: Python int too large to convert to C long
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

How can I fix this? 

Hugo Macdermott-Opeskin

unread,
Oct 6, 2022, 6:56:47 PMOct 6
to MDnalysis discussion
Hi Christos, 

This looks like a low level error and is likely not your fault. 
How many frames in your trajectory? It looks to me like you are overflowing the size of C int32, (2147483647)

Christos Efthymiou

unread,
Oct 6, 2022, 7:02:12 PMOct 6
to MDnalysis discussion
Hi Hugo, 

There are 50,000 frames in my trajectory, it is only a 100ns simulation. I don't understand why setting a step size would cause a problem. I have been able to run RMSD analysis on all the frames, so shouldn't a step size make it even easier to run analysis? 

Hugo Macdermott-Opeskin

unread,
Oct 7, 2022, 8:25:46 PMOct 7
to MDnalysis discussion
Would you be able to attatch your trajectory so I can try and reproduce the error?
Cheers

Hugo

Christos Efthymiou

unread,
Oct 10, 2022, 6:23:36 PMOct 10
to MDnalysis discussion
Hi Hugo, 

Unfortunately my trajectory is too large to be attached. It is 45 GB. Is there any other way to speed up the calculation process other than setting a step size? I was using VMD for smaller trajectory files (20 GB), and the calculations for things like RMSD and RMSF were performed quite rapidly (minutes). MD Analysis is taking several hours to perform just the RMSD analysis for one trajectory, so I am unable to complete my analysis in an efficient way. 

Best, 
Christos 

Irfan Alibay

unread,
Oct 10, 2022, 6:34:58 PMOct 10
to MDnalysis discussion
Hi Christos, 

By any chance are you using a Windows machine? 

This unfortunately sounds like another case of this error: https://github.com/MDAnalysis/mdanalysis/issues/3075

> I was using VMD for smaller trajectory files (20 GB), and the calculations for things like RMSD and RMSF were performed quite rapidly (minutes). MDAnalysis is taking several hours to perform just the RMSD analysis for one trajectory, so I am unable to complete my analysis in an efficient way. 

This doesn't seem right, here do you mean that when using the 20 GB file with MDAnalysis it is significantly slower than VMD? Are you also accounting for RMS alignment as part of the VMD timing here?

Best,

Irfan

Naveen Michaud-Agrawal

unread,
Oct 10, 2022, 7:51:45 PMOct 10
to mdnalysis-...@googlegroups.com
Hi Christos,

Are you using a 64 bit version of python?

Regards,
Naveen

Christos Efthymiou

unread,
Oct 11, 2022, 12:45:34 PMOct 11
to MDnalysis discussion
Hi, 

Yes I am using a Windows machine. I see a workaround was provided for that specific scenario, but I am not sure how to apply it to my case. For example, here is my code for RMSD: 

#Calculate RMSD
from MDAnalysis.analysis import rms

u = mda.Universe("R:/Wildtype_100ns_Test2/Wildtype_100ns_Test2/run/Wildtype_100ns_Test2_QwikMD.psf", "R:/Wildtype_100ns_Test2/Wildtype_100ns_Test2/run/MD.dcd")


R = rms.RMSD(u,  # universe to align
             u,  # reference universe or atomgroup
             select='backbone',  # group to superimpose and calculate RMSD
             #groupselections=[CORE, LID, NMP],
             ref_frame=0)  # frame index of the reference
R.run(0,-1,10)

R.rmsd.shape
rmsd_results = np.array(R.rmsd)
print(rmsd_results)
np.savetxt('rmsd_10stride.dat', rmsd_results)

How can I implement the given workaround in this scenario? 

Yes, I am not sure why but MD Analysis seems to take exponentially longer than VMD. For example, I just ran an RMSD analysis with MD Analysis and it took 14 hours to complete. I am not sure what the RMS alignment means timing-wise for VMD. In VMD I would load a trajectory (~15 minutes) and then the RMSD would be calculated within about 2 minutes. 

Best, 
Christos 

Christos Efthymiou

unread,
Oct 11, 2022, 12:46:43 PMOct 11
to MDnalysis discussion
Hi Naveen, 

Yes, I believe it is a 64 bit version. The output of this cell

import sys
print(sys.maxsize)

was 9223372036854775807. 

Best, 
Christos 
Reply all
Reply to author
Forward
0 new messages