MDAnalysis -Distance between two residues over time

740 views
Skip to first unread message

Juliette Newell

unread,
Jun 20, 2022, 7:09:22 AM6/20/22
to MDnalysis discussion
Dear MDAnalysis users,

I am trying to calculate the distance between the centre of mass of two residues over the trajectory.

import numpy as np
import MDAnalysis as mda
from MDAnalysis.analysis import distances

u = mda.Universe("mol.top", "traj.nc"

res_te = u.select_atoms('resid 28')
res_tt = u.select_atoms('resid 32')

sel_te = res_te.center_of_mass(compound=('residues'))
sel_tt = res_tt.center_of_mass(compound=('residues'))

n_te = len(sel_te)
n_te = len(sel_te)

dist = distances.distance_array(n_te, n_tt,
                                                        box = u.dimensions)

However this gives me a single distance instead of the distance for each frame.

Is anyone able to help please?

Thank you,
Juliette




Richard Gowers

unread,
Jun 20, 2022, 7:14:59 AM6/20/22
to Mdnalysis-Discussion
Hi Juliette,

MDAnalysis only loads a single frame of trajectory at a time.  To loop through your trajectory you can iterate "u.trajectory":

dists = []
for ts in u.trajectory:
    dist = distances.distance_array(...)
    dists.append(dist)

You will have to recalculate the variables sel_te and sel_tt inside the loop as well so that these are recalculated for each frame!  If you want to build a timeseries, I'd recommend appending to a list as I have included for you.

Thanks
Richard

--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/979834b1-7c4e-4e1c-9269-2582865f37dfn%40googlegroups.com.

Juliette Newell

unread,
Jun 20, 2022, 7:51:23 AM6/20/22
to MDnalysis discussion
Hi Richard,

Thank you for your help.

I am very new to MDAnalysis/python.

So to create a loop, i make an empty list : dists =[]
then i start the loop

for ts in u.trajectory:
       res_te = u.select_atoms('resid 28')
       res_tt = u.select_atoms('resid 32')

       sel_te = res_te.center_of_mass(compound=('residues'))
       sel_tt = res_tt.center_of_mass(compound=('residues'))

       n_te = len(sel_te)
        n_tt = len(sel_tt)
       dist = distances.distance_array(n_te, n_tt,
                                                               box = u.dimensions)
      dists.append(dist)

Does that look right?

Thanks again,
Juliette 

Richard Gowers

unread,
Jun 21, 2022, 5:13:09 AM6/21/22
to Mdnalysis-Discussion
Hi Juliette,

Yes that looks correct.  Looking at it again, your "len" calls gives the number of items inside an object.  E.g. for a list with 4 entries, len returns "4".  I think you'll want to put in the sel_te and sel_tt variables (these are coordinates) into distance array.

Richard

Juliette Newell

unread,
Jun 21, 2022, 7:58:02 AM6/21/22
to MDnalysis discussion
Hi Richard,

Thanks again for your help.  I think I have just about got the script working as the distance between the COM of each residue per frame.  However the results aren't as anticipated as in the simulation the residues I have selected get closer with time, but the script I have used shows the distance increasing over time, so I am wondering if it is a problem with my script and how I have defined the distances?

from MDAnalysis.analysis import distances
import csv
import pandas as pd
import pickle


u = mda.Universe("CDPA2.top", "CDPA_reimage.nc")

with open ('dist3.csv', 'w') as f:
    writer = csv.writer(f)

dists =[]
for ts in u.trajectory[:10000:1]:

    res_te = u.select_atoms('resid 28')
    res_tt = u.select_atoms('resid 31')
    sel_te = res_te.center_of_mass(compound='residues')
    sel_tt = res_tt.center_of_mass(compound='residues')
    dist=distances.distance_array(sel_te,sel_tt,
        box=u.dimensions)
    dists.append(dist)

narr = np.array(dists)

arr = narr.reshape(10000,1)

df = pd.DataFrame(arr)

df.to_csv("dist3.csv")

Thank you for your help
Juliette







.

Tamas Hegedus

unread,
Jun 21, 2022, 8:43:38 AM6/21/22
to mdnalysis-...@googlegroups.com
Dear Juliette,

I think the script is overcomplicated, thus it may have more bugs, it may be more difficult to debug than a simpler one.
Start with a very simple one like below and then try to modify it for your objectives, gradually getting more complex.

E.g.

res_te = u.select_atoms('resid 28')
res_tt = u.select_atoms('resid 31')
# always try to place the selection outside the for cycle to increase performance
# you could also start with "resid and name CA"; depending on your objectives, using CA instead of center of mass may be sufficient for you and faster to calculate if you want distance between all amino acid pairs
dL = []
for ts in xxx:
  com1 = res_te.center_of_mass()
  com2 = res_tt.center_of_mass()
  dL.append(numpy.linalg.norm(com1-com2))
  # if you use distance array, you can increase performance if you have a predefined numpy array for the results (see mda docs)

open("distances.csv", "w").write(",".join(map(str, dL)) 
# I am sorry for this :-) The message: I find pandas an overkill here

Tamas
Reply all
Reply to author
Forward
0 new messages