learning MD/PCA, simple illustrative example of PCA for a MD run?

33 views
Skip to first unread message

James Metz

unread,
Dec 19, 2022, 10:59:25 PM12/19/22
to MDnalysis discussion
Hello,

I am interested in learning how to use PCA following a MD run, and especially
how to interpret the results.  I have begun
collecting and studying the literature.  I am wondering, is there some very
small and simple example structure e.g., a peptide (?), that if one performs
gas phase MD, one can obtain a trajectory which can be analyzed by PCA
and the results and interpretation (motion vectors, or points in PC vs. PC plots) 
are very clear.  The MD/PCA literature seems to imply that this analysis technique is useful for analyzing more "large scale" overall protein conformations.  I am curious if the method is sensitive to smaller
changes e.g., motions of several amino acids in one or a few helices that might
be a signal for agonist activity of a CNS receptor.  I greatly welcome comments, 
suggestions for small model systems as positive controls, etc.  Thank you.

Regards,
Jim Metz

Hugo Macdermott-Opeskin

unread,
Dec 22, 2022, 8:59:28 PM12/22/22
to MDnalysis discussion
Hi Jim, 

Welcome to the MDAnalysis mailing list. The first thing I'll say is that this mailing list is focused around the MDAnalysis software rather than general questions about MD but you can indeed run PCA with MDAnalysis.

The quick answer to your question is that as PCA re-orients your data along the axes of maximum variance, you will always get the largest motions present in your dataset as the first few principal components. This website gives a great visual example https://setosa.io/ev/principal-component-analysis/.  

The key is recognising that what the largest motion is dependent on what the system is, for example in a small peptide system, one could indeed have the motions of a few  amino acids be a dominant principal component, regardless of the fact the we would call these motions "small" in more objective terms. It  all depends on the relative size of the motion with respect to the dataset. 

Cheers 

Hugo :)

James Metz

unread,
Dec 22, 2022, 10:09:53 PM12/22/22
to MDnalysis discussion
Hugo,
Thank you for your reply.  Sorry if my question is not "How do I write
Python code and MDAnalysis code to do XYZ ..."  If there is a better website
or blog for my question or similar questions concerning general MD
set up, analysis techniques, and interpretations, please let me know.  Where
can one go to have an email chat with "experts?"  Yes, I do own several of the
classic MD books.

I am aware of the general principles behind PCA, and the website you suggested is
yet another good intro tutorial.  Thank you.

My question is essentially about whether there is some simple, educational MD PCA example 
that could also serve as a positive control, especially where the points in the PC plots are 
reasonably well understood.  I am able to run MD using MOE software and I
have a MOE utility program that supposedly generates PCs given a MD trajectory, but I don't know
how to interpret the results, of if the results are "correct."  Hence, my interest in a positive
control just to make sure everything is working as expected.

Also, the utility program I have access to can create subsets from the trajectory snapshots,
and supposedly perform PCA on the subsets. Hence, perhaps I can focus on the relative 
movements of a few amino acids of interest even if they are not involved in larger scale 
motions, or else obscured in the larger motions.  Hmmm... 

Again, I greatly welcome comments, suggestions, examples, etc.

Regards,
Jim Metz

Hugo Macdermott-Opeskin

unread,
Dec 22, 2022, 10:45:51 PM12/22/22
to MDnalysis discussion
Hi Jim,

As regards a simple education example I imagine alanine dipeptide would be a good start it has been done to death as a toy system for just about everything you can think of and its dynamics are very well understood. eg https://pubs.acs.org/doi/10.1021/jp100950w

I think i need clarification on what you mean by positive control? Are you aiming to test whether your script produces correct results? Or are you looking for something to compare your results to. AFAIK the second one is impossible because for PCAs to be comparable they need to share the same variables and observations, ie need to be same structure. 

Interpreting PCA is always tricky. It may sound tautological, but the PCA is no more or less than the data you already have re-oriented along axes of maximum variance. You can use that to draw out what the largest motions are and see approximate relationships between motions are but beyond that its up to you to draw your own meaning. 

Good luck, hope this helped.

Cheers
Reply all
Reply to author
Forward
0 new messages