Hello MDAnalysis wizards,
I wrote a MDA script to accumulate contacts (within 4.5 Angstroms) between residues. I pasted in the business part below; it used the capped_distance method, which is nice and fast (for my system, it returns the list of pairs in around 0.01 s per frame on my ok machine). I'm writing today to see if there are any tips to speed up the slowest part (noted in the comment below). It turns out that converting the map of tuples into a unique countable list of tuples takes around a half-second for this system (going from around 50,000 atom-atom pairs down to around 2000 residue-residue contacts). here, I try to make it easier on the eyes:
resid_neighs = set(
map(
lambda x:
tuple(sorted(
[
protein_group.atoms[x[0]].resid,
protein_group.atoms[x[1]].resid
]
)), neighs))
Perhaps there is a faster way to do this with other MDA library functions? or with python; I'm still learning python. (Aside: I went through the contact_matrix example for the salt-bridges, but it doesn't seem to provide a simpler/faster path to the residue-residue contact)
Thank you for creating this wonderful library!
Demian
business part (add one to residue-residue contact counter if at least one pair of atoms is within 4.5 angstroms):
-----------------------------------------------------
contact_map = Counter()
for ts in u.trajectory:
# all atom-atom contacts within 4.5 angstroms
neighs = mda_distances.capped_distance(
protein_group.positions,
protein_group.positions,
4.5,
return_distances = False )
# set on sorted tuples removes duplicates (slowest piece)
resid_neighs = set(map(lambda x: tuple(sorted([protein_group.atoms[x[0]].resid,protein_group.atoms[x[1]].resid])), neighs))
for contact in resid_neighs:
contact_map[contact] += 1