best way to do Principal Component Analysis (PCA)

469 views
Skip to first unread message

james.odon

unread,
Apr 17, 2015, 11:49:50 PM4/17/15
to qiime...@googlegroups.com
Hello all,

I need to do a principal component analysis on my data, but I have no experience with this and do not know anyone who has done it.
Is it possible to do with QIIME?  If not, what is the best way to go about doing this.  What programming languages or packages are typically used to do this?

Thanks in advance!

Jim

John Chase

unread,
Apr 19, 2015, 8:33:07 PM4/19/15
to qiime...@googlegroups.com
Hi,

Yes, you can run principle components analysis is possible with QIIME. This would be done by running beta_diversity.py with the euclidean distance metric, and then running principal_coordinates.py on the distance matrix created in the first step. 

R is commonly used for this type of analysis, though I don't have enough of a background in R to comment on that. 

If anyone has any other suggestions, they would be welcome.

John

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James O'Donnell

unread,
Apr 19, 2015, 8:50:07 PM4/19/15
to qiime...@googlegroups.com
So that will give you a principal components analysis (PCA) rather than a principal coordinates analysis (PCoA) ?

-Jim

John Chase

unread,
Apr 19, 2015, 9:02:52 PM4/19/15
to qiime...@googlegroups.com
Yes, that's correct. Principle components is a principle coordinates analysis where the distance metric is specifically Euclidean distance. 

John

spol...@gmail.com

unread,
Oct 20, 2015, 5:33:35 PM10/20/15
to Qiime Forum
When I compare my results with beta_div.py using binary_euclidean vs euclidian, the binary_euclidian looks a lot more like what I expected once it is displayed with Emperor.   The MDS plot I got with cummeRbund looks like a 2D projection of the 3D plot obtained with the binary_euclidian.  On the page http://qiime.org/scripts/beta_diversity.html it says that binary_ "specifies that a metric is qualitative, and considers only the presence or absence of each taxon". I am now not sure if you meant to say euclidian or binary_euclidan.   Can you comment on this, and help explain the difference?
Thanks

Bill

Antonio González Peña

unread,
Oct 21, 2015, 9:12:03 AM10/21/15
to Qiime Forum
Well, PCA is a special case of PCoA using euclidean distances. Now,
you are comparing your results from euclidean and binary_euclidean to
cummeRbund and to be honest I don't know what that script uses.
Furthermore, the documentation is not explicit abot it:
http://bioconductor.org/packages/release/bioc/manuals/cummeRbund/man/cummeRbund.pdf.
Anyway, my guess is that they are using euclidean but they are log
transforming the values: logMode is set to TRUE by default. Thus,
suggest trying to set that value to FALSE and check if the plots now
look more similar.
Antonio
Reply all
Reply to author
Forward
0 new messages