Dear all,
I would be very grateful if someone could sanity check whether I have interpreted the use of MDAnalysis.analysis.contacts correctly for the purpose discussed below. I would rather hold an open discussion now, rather than find out any error/concerns later when a mistake would cost me dearly.
Molecular Dynamics Model:
A solvated DPPC and POPC bilayer (with counter ions) and four identical helical transmembrane peptides, spaced evenly apart. Forcefield: Martini (latest, version 3 beta). MD suite: Gromacs. The system is left to minimise (steepest descent) followed by several rounds of equilibration and a final production NPT simulation.
MDAnalysis work:
I am using the MDAnalysis.analysis.contacts code to calculate the number of protein-protein interactions. Am I correctin assuming that a protein selection will include contacts within itself (same chain proteins) and with neighbouring protein (different chain proteins)?
Objective:
As the simulation evolves with time, different oligomeric states will report a unique range of protein-protein contacts.
MDAnalysis Code and Example:
Consider the four helical peptide system discussed above. They can conform to the following oligomeric states with the order given as the number of reported protein-protein contacts:
four monomers << dimer + two monomers << trimer + monomer ?? two dimers << fourmer
Notice the relationship "trimer + monomer ?? two dimers"? That's because I'm yet to see whether a trimer yields more protein-protein contacts than two dimers.
This is my code:
----------------------------------------------------------------------------------------------------------
# MDAnalysis contact analysis
##Class types - always very useful to have for speedy documentation lookup
#AG == "MDAnalysis.core.groups.AtomGroup"
#TS == "MDAnalysis.coordinates.base.Timestep"
#Array == "numpy.ndarray"
import numpy
import pandas
import matplotlib.pyplot as plt
import MDAnalysis
from MDAnalysis.analysis import contacts
radius = 10
groFileStr: str = "D:/DE_NOVO_PROTEINS/PROTEIN_ACCESSIBILITY/npt_eq_1.gro"
xtcFileStr: str = "D:/DE_NOVO_PROTEINS/PROTEIN_ACCESSIBILITY/npt_eq_1.xtc"
universe = MDAnalysis.Universe(groFileStr, xtcFileStr)
proteinSelectionStr: str = "protein"
proteinAG = universe.select_atoms(proteinSelectionStr)
timeSeriesList = []
for timeStepTS in universe.trajectory:
distanceArray = contacts.distance_array(proteinAG.positions, proteinAG.positions)
#count the number of contacts then convert back to native Int type
contactsInt = contacts.contact_matrix(distanceArray, radius).sum().item()
timeSeriesList.append([timeStepTS.frame, contactsInt])
timeSeriesArray = numpy.array(timeSeriesList)
timeSeriesDF = pandas.DataFrame(timeSeriesArray, columns=['Frame', 'no. Contacts'])
timeSeriesDF.plot(x='Frame')
plt.ylabel("no. Protein contacts")
----------------------------------------------------------------------------------------------------------
One of my many test simulations produces the following output:
And I've taken the liberty of visualising four frames, 0, 200, 400 and 600 of the helices from above.
I am interested to know whether:
a) I have interpreted the functionalist of MDAnalysis.analysis.contacts correctly?
b) I might have missed something, but will MDAnalysis.analysis.contacts factor in 3D periodic boundary conditions of my unit cell? Or must I rectify the PBC for my unit cell first?
Many thanks
Anthony