distances between two groups of atoms?

1,236 views
Skip to first unread message

jmborr

unread,
Mar 21, 2012, 6:26:42 PM3/21/12
to MDnalysis discussion
Dear MDAnalysis users,

I want to calculate the distances between atoms in two atomGroup
objects, the number of atoms need not be the same.
I could not find a dedicated function. Is my only chance to resort to
cast the coordinates to ndarray and use numpy?

Best regards,
-Jose

Oliver Beckstein

unread,
Mar 21, 2012, 6:38:10 PM3/21/12
to mdnalysis-...@googlegroups.com
Hi José,

On 21 Mar, 2012, at 15:26, jmborr wrote:

> I want to calculate the distances between atoms in two atomGroup
> objects, the number of atoms need not be the same.
> I could not find a dedicated function. Is my only chance to resort to
> cast the coordinates to ndarray and use numpy?

Probably the easiest is to use MDAnalysis.analysis.distances.distance_array on the coordinates of two AtomGroups, eg

g1 = universe.selectAtoms("(resname ARG or resname LYS) and not backbone")
g2 = universe.selectAtoms("(resname ASP or resname GLU) and not backbone")
dist = MDAnalysis.analysis.distances.distance_array(g1.coordinates(), g2.coordinates())

This will give you the "matrix" of distances.

You can also look at MDAnalysis.analysis.contacs for some contact analysis classes.

Oliver

--
Oliver Beckstein * orbe...@gmx.net
skype: orbeckst * orbe...@gmail.com

jmborr

unread,
Mar 21, 2012, 8:23:44 PM3/21/12
to MDnalysis discussion
Thank you Oliver, this will help a lot :)

At the risk of being a spammer, I have another question. Would the
statement universe.selectAtoms("protein around 3.5 ans resname WAT")
provide a different set of atoms as I iterate in a trajectory?
Something like this:

for ts in universe.trajectory:
atoms=universe.selectAtoms("protein around 3.5 AND resname WAT")

-Jose
> Oliver Beckstein * orbec...@gmx.net
> skype: orbeckst  * orbec...@gmail.com

Oliver Beckstein

unread,
Mar 21, 2012, 8:33:22 PM3/21/12
to mdnalysis-...@googlegroups.com
On 21 Mar, 2012, at 17:23, jmborr wrote:

> Thank you Oliver, this will help a lot :)

That's good :-)

>
> At the risk of being a spammer, I have another question.

This is totally on-topic so I am certainly not going to ban you from the mailing list for asking sensible questions!

> Would the statement universe.selectAtoms("protein around 3.5 ans resname WAT")
> provide a different set of atoms as I iterate in a trajectory?
> Something like this:
>
> for ts in universe.trajectory:
> atoms=universe.selectAtoms("protein around 3.5 AND resname WAT")

Yes, typically atoms will change in size between timesteps. When I do this I typically add a line such as

print "Solvation shell with %d waters" % atoms.numberOfResidues()

Also note that you can get the water molecules with

atoms.residues

Note that the distance selection can be made a little bit faster by not using the selection parser but explicitly constructing the KD-Tree, as described under http://packages.python.org/MDAnalysis/documentation_pages/KDTree/NeighborSearch.html

Cheers,
Oliver

--

jmborr

unread,
Mar 22, 2012, 11:23:11 AM3/22/12
to MDnalysis discussion
Sweet! I wanted to find pairs of solvent-protein atoms in contact. The
KDE-tree will help me reduce the number of distance computation:

water = universe.selectAtoms('name OH2')
protein = universe.selectAtoms('protein and not H*')

for frame in ts.trajectory:
ns_w = NeighborSearch.AtomNeighborSearch(water)
ns_p = NeighborSearch.AtomNeighborSearch(protein)
closeWAT = ns_w.search_list(protein,4.6)
closePROT=ns_p.search_list(water,4.6)

distance_array( frame[ closeWAT.indices() ],frame[ closePROT.indices()] )
#small matrix, easier to look for contact pairs

-Jose
> Note that the distance selection can be made a little bit faster by not using the selection parser but explicitly constructing the KD-Tree, as described underhttp://packages.python.org/MDAnalysis/documentation_pages/KDTree/Neig...
>
> Cheers,
> Oliver
>
> --

Oliver Beckstein

unread,
Mar 22, 2012, 4:29:06 PM3/22/12
to mdnalysis-...@googlegroups.com

On 22 Mar, 2012, at 08:23, jmborr wrote:

> Sweet! I wanted to find pairs of solvent-protein atoms in contact. The
> KDE-tree will help me reduce the number of distance computation:

That should work but depending on system sizes you can probably make it a little bit faster (check with timeit, %time/%timeit in IPython or the python profilers), see comments below.

>
> water = universe.selectAtoms('name OH2')
> protein = universe.selectAtoms('protein and not H*')
>
> for frame in ts.trajectory:
> ns_w = NeighborSearch.AtomNeighborSearch(water)
> ns_p = NeighborSearch.AtomNeighborSearch(protein)
> closeWAT = ns_w.search_list(protein,4.6)
> closePROT=ns_p.search_list(water,4.6)

Better:

closePROT=ns_p.search_list(closeWAT, 4.6)

(only uses the waters that we know are close)

>
> distance_array( frame[ closeWAT.indices() ],frame[ closePROT.indices()] )

Use coordinates() directly:

distance_array( closeWAT.coordinates(), closePROT.coordinates() )


However, I quickly tested this with the GRO/XTC pair in the test files and found that it is faster (~x 2 [1]) not to build a KD-tree for the protein but directly calculate the distance array:

for frame in ts.trajectory:
ns_w = NeighborSearch.AtomNeighborSearch(water)

closeWAT = ns_w.search_list(protein,4.6)
d = distance_array( closeWAT.coordinates(), protein.coordinates() )

You can then get a list of distances (e.g. for a histogram) as d[d < 4.6] or the total number as np.sum(d < 4.6). Getting individual pairs can be done with np.where(d<4.6).

Oliver


[1] For reference (in IPython):

import MDAnalysis;
from MDAnalysis.tests.datafiles import *
import MDAnalysis.KDTree.NeighborSearch

u = MDAnalysis.Universe(GRO,XTC)
protein = u.selectAtoms("protein and not name H*")
water = u.selectAtoms("resname SOL and name OW")

# the following would be done for each time step

ns_w = MDAnalysis.KDTree.NeighborSearch.AtomNeighborSearch(water)
shell = ns_w.search_list(protein, 4.6)

# %timeit is IPython's built-in access to the timeit benchmarking module
# (ran on a Macbook Pro, 2.66 GHz Core Duo))

%timeit -n 10 -r 3 d = distance_array(protein.coordinates(), shell.coordinates())
#10 loops, best of 3: 19 ms per loop

%timeit -n 10 -r 3 ns_p = MDAnalysis.KDTree.NeighborSearch.AtomNeighborSearch(protein); close_p = ns_p.search_list(shell, 4.6)
#10 loops, best of 3: 44.4 ms per loop

jmborr

unread,
Mar 22, 2012, 5:11:14 PM3/22/12
to MDnalysis discussion

> Better:
>
> closePROT=ns_p.search_list(closeWAT, 4.6)
>
> (only uses the waters that we know are close)
>
Hahaha, I should have seen this one!

> You can then get a list of distances (e.g. for a histogram) as d[d < 4.6] or the total number as np.sum(d < 4.6). Getting individual pairs can be done with np.where(d<4.6).

This was a huge speed improvement. I was casting the distance matrix
to scipy.sparse.coo_matrix to get those pairs. Quite convoluted.

I will keep the KDE-tree for the protein just for symmetry purposes,
in case I want to compute contacts between two types of solvent in a
mixture. I can live with a 2x decrease because it is very fast as it
is now. You should've seen my first implementation, hahaha.

Thank you so much for your help, Oliver
-Jose
Reply all
Reply to author
Forward
0 new messages