Sanity check the use of MDAnalysis.analysis.contacts

distribut...@googlemail.com

unread,

Jun 18, 2021, 12:31:27 PM6/18/21

to MDnalysis discussion

Dear all,

I would be very grateful if someone could sanity check whether I have interpreted the use of MDAnalysis.analysis.contacts correctly for the purpose discussed below. I would rather hold an open discussion now, rather than find out any error/concerns later when a mistake would cost me dearly.

Molecular Dynamics Model:

A solvated DPPC and POPC bilayer (with counter ions) and four identical helical transmembrane peptides, spaced evenly apart. Forcefield: Martini (latest, version 3 beta). MD suite: Gromacs. The system is left to minimise (steepest descent) followed by several rounds of equilibration and a final production NPT simulation.

MDAnalysis work:

I am using the MDAnalysis.analysis.contacts code to calculate the number of protein-protein interactions. Am I correctin assuming that a protein selection will include contacts within itself (same chain proteins) and with neighbouring protein (different chain proteins)?

Objective:

As the simulation evolves with time, different oligomeric states will report a unique range of protein-protein contacts.

MDAnalysis Code and Example:

Consider the four helical peptide system discussed above. They can conform to the following oligomeric states with the order given as the number of reported protein-protein contacts:

four monomers << dimer + two monomers << trimer + monomer ?? two dimers << fourmer

Notice the relationship "trimer + monomer ?? two dimers"? That's because I'm yet to see whether a trimer yields more protein-protein contacts than two dimers.

This is my code:

----------------------------------------------------------------------------------------------------------

# MDAnalysis contact analysis

##Class types - always very useful to have for speedy documentation lookup

#AG == "MDAnalysis.core.groups.AtomGroup"

#TS == "MDAnalysis.coordinates.base.Timestep"

#Array == "numpy.ndarray"

import numpy

import pandas

import matplotlib.pyplot as plt

import MDAnalysis

from MDAnalysis.analysis import contacts

radius = 10

groFileStr: str = "D:/DE_NOVO_PROTEINS/PROTEIN_ACCESSIBILITY/npt_eq_1.gro"

xtcFileStr: str = "D:/DE_NOVO_PROTEINS/PROTEIN_ACCESSIBILITY/npt_eq_1.xtc"

universe = MDAnalysis.Universe(groFileStr, xtcFileStr)

proteinSelectionStr: str = "protein"

proteinAG = universe.select_atoms(proteinSelectionStr)

timeSeriesList = []

for timeStepTS in universe.trajectory:

distanceArray = contacts.distance_array(proteinAG.positions, proteinAG.positions)

#count the number of contacts then convert back to native Int type

contactsInt = contacts.contact_matrix(distanceArray, radius).sum().item()

timeSeriesList.append([timeStepTS.frame, contactsInt])

timeSeriesArray = numpy.array(timeSeriesList)

timeSeriesDF = pandas.DataFrame(timeSeriesArray, columns=['Frame', 'no. Contacts'])

timeSeriesDF.plot(x='Frame')

plt.ylabel("no. Protein contacts")

----------------------------------------------------------------------------------------------------------

One of my many test simulations produces the following output:

And I've taken the liberty of visualising four frames, 0, 200, 400 and 600 of the helices from above.

I am interested to know whether:

a) I have interpreted the functionalist of MDAnalysis.analysis.contacts correctly?

b) I might have missed something, but will MDAnalysis.analysis.contacts factor in 3D periodic boundary conditions of my unit cell? Or must I rectify the PBC for my unit cell first?

Many thanks

Anthony

Oliver Beckstein

unread,

Jun 22, 2021, 3:23:20 PM6/22/21

to mdnalysis-discussion

Hi Anthony,

Sorry for some short comments — apologies for brevity/possible inaccuracies. Comments inline.

On Jun 18, 2021, at 9:31 AM, 'distribut...@googlemail.com' via MDnalysis discussion <mdnalysis-...@googlegroups.com> wrote:

Dear all,

I would be very grateful if someone could sanity check whether I have interpreted the use of MDAnalysis.analysis.contacts correctly for the purpose discussed below. I would rather hold an open discussion now, rather than find out any error/concerns later when a mistake would cost me dearly.

Molecular Dynamics Model:
A solvated DPPC and POPC bilayer (with counter ions) and four identical helical transmembrane peptides, spaced evenly apart. Forcefield: Martini (latest, version 3 beta). MD suite: Gromacs. The system is left to minimise (steepest descent) followed by several rounds of equilibration and a final production NPT simulation.

MDAnalysis work:
I am using the MDAnalysis.analysis.contacts code to calculate the number of protein-protein interactions.

Just a note: You’re using the building blocks (and not the full-fledged ContactAnalysis). distance_array is really lib.distances.distance_array. You can look into using lib.distances.capped_distances() as a faster alternative.

Am I correctin assuming that a protein selection will include contacts within itself (same chain proteins) and with neighbouring protein (different chain proteins)?

Your proteinAG below will contain every atom that is in a “protein”. So when you use the positions in the distance_array() you’ll also get self-contacts.

You are missing PBC.

distance_array(x, y, box=ts.dimensions)

#count the number of contacts then convert back to native Int type
contactsInt = contacts.contact_matrix(distanceArray, radius).sum().item()

You might be double counting here if your distance matrix is symmetric.

timeSeriesList.append([timeStepTS.frame, contactsInt])

timeSeriesArray = numpy.array(timeSeriesList)
timeSeriesDF = pandas.DataFrame(timeSeriesArray, columns=['Frame', 'no. Contacts'])

timeSeriesDF.plot(x='Frame')
plt.ylabel("no. Protein contacts")
----------------------------------------------------------------------------------------------------------

One of my many test simulations produces the following output:

<contacts_test.jpg>
And I've taken the liberty of visualising four frames, 0, 200, 400 and 600 of the helices from above.

<oligomers.jpg>
I am interested to know whether:
a) I have interpreted the functionalist of MDAnalysis.analysis.contacts correctly?

Generally yes, see comments above.

Also note that ContactAnalysis includes various ways to transform the cutoff.

b) I might have missed something, but will MDAnalysis.analysis.contacts factor in 3D periodic boundary conditions of my unit cell? Or must I rectify the PBC for my unit cell first?

The ContactAnalysis class will automatically do PBC. In your example you have to add it by using the box kwarg. Once you do, all distances are calculated with minimum image convention and you don’t have to preprocess.

capped_distances might be faster — I am not sure why we’re not using it in the ContactAnalysis code — maybe Richard or Hugo know.

Oliver

Many thanks
Anthony

--
You received this message because you are subscribed to the Google Groups "MDnalysis discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/3224832f-e518-452a-8ce0-c1cf4466ebc4n%40googlegroups.com.
<oligomers.jpg><contacts_test.jpg>

--

Oliver Beckstein (he/his/him)

email: orbe...@mdanalysis.org
twitter: @orbeckst

MDAnalysis – a NumFOCUS fiscally sponsored project

https://www.mdanalysis.org/

Lily Wang

unread,

Jun 22, 2021, 3:43:45 PM6/22/21

to mdnalysis-...@googlegroups.com

capped_distances might be faster — I am not sure why we’re not using it in the ContactAnalysis code — maybe Richard or Hugo know.

That would be a decent speed boost. The code was first written with distance_array in 2010 and capped_distances was added ~8 years later, so probably no one thought to update it.

Cheers,

Lily

To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-discussion/A8B6FCCD-0D72-4B70-93EE-DA4EA8519767%40mdanalysis.org.

distribut...@googlemail.com

unread,

Jun 23, 2021, 6:53:41 AM6/23/21

to MDnalysis discussion

Hi Lily and Oliver,

Thank you so much for your comments. I'm dashing between projects and students at the moment so I'm hoping to revisit this work in a few days. I'll digest your feedback and let you know how it goes. Again, thanks for your assistance in my understanding.

Anthony

Reply all

Reply to author

Forward