Unifrac PCA ordination.

1,998 views
Skip to first unread message

Peter Atanackov

unread,
Dec 1, 2015, 9:43:35 AM12/1/15
to qiime...@googlegroups.com
Hello,

I've been trying to make ordination plots but have been bumping in to obstacles.

I would like to make an UniFrac phylogenetic distance matrix, and make a principal component analysis ordination plot (PCA) based on it and also carry out a permutation test.

The problem is while Qiime can make an Unifrac matrix and perform a permutation test in can only perform a Principal Coordinates Analysis (PCoA) and not a Principal Components Analysis (PCA). 

I have since resorted to Vegan and Phyloseq without much success. While the Vegan library in R can perform PCA ordination and a permutation test, it can't make an Unifrac matrix in the first place. Phyloseq while it can make an Unifrac matrix, I am not sure it can perform PCA ordination and it can't perform a permutation test. 

So I'm left in a strange place where I can't complete the analysis with the most standard developed tools.

I am surely missing stuff, so my question is, could you suggest a library/method in R that can make an Unifrac matrix with which I could then make a PCA plot and possibly do a permutation test. Or should I try to convert the Qiime unifrac matrix to a Vegan readable format and continue on from there?

Thank you for any suggestions! 

Colin Brislawn

unread,
Dec 1, 2015, 12:26:29 PM12/1/15
to Qiime 1 Forum
Hello Peter,

I've been using Phyloseq (and other R packages) for all my downstream analysis with great success. 

Phyloseq while it can make an Unifrac matrix, I am not sure it can perform PCA ordination and it can't perform a permutation test.
I do this with Phyloseq all the time! Let me get you started.


I start by importing all my stuff from qiime into phyloseq. Then I calculate unifrac distances and ordinate, as discribed on this page:

ordu = ordinate(phyloseq_object, "PCA", "unifrac", weighted = TRUE) plot_ordination(phyloseq_object, ordu, color = "SampleType", shape = "human")

That first line calculates the distances and does the ordination, the second line makes a graph. 




Here is how to perform an adonis test (a permutation MANOVA, also in the vegan package) on your phyloseq_object:

df = as(sample_data(phyloseq_object), "data.frame")
d = distance(phyloseq_object, "unifrac")
phyloseq_adonis = adonis(d ~ SURFACE + GENDER + BUILDING + FLOOR, df)
phyloseq_adonis 

Good luck!
Colin


Peter Atanackov

unread,
Dec 1, 2015, 2:41:22 PM12/1/15
to qiime...@googlegroups.com
Hello Colin! 

Thank you for the reply, I unfortunately keep getting an error if I try to use PCA ordination.

In plot_ordination(DataTogether, ordu, color = "Description", shape = "Mof") :
  Could not obtain coordinates from the provided `ordination`. 
Please check your ordination method, and whether it is supported by `scores` or listed by phyloseq-package.

But if I put in PCoA it completes normally, how very strange.

Colin Brislawn

unread,
Dec 1, 2015, 4:02:43 PM12/1/15
to Qiime 1 Forum
Hello Peter,

It looks like this command is not working properly:
ordu = ordinate(phyloseq_object, "PCA", "unifrac", weighted = TRUE)

I thought "PCA" was a valid method, but I was wrong. Some valid methods are listed here:

dist = "bray"
ord_meths = c("DCA", "CCA", "RDA", "DPCoA", "NMDS", "MDS", "PCoA")
plist = llply(as.list(ord_meths), function(i, physeq, dist) {
    ordi = ordinate(physeq, method = i, distance = dist)
    plot_ordination(physeq, ordi, "samples", color = "SampleType")
}, GP1, dist)

I'm not familiar with the math behind all these methods. Does one of them work for you? (Is there a reason you want to use PCA over other metrics?)


Colin

  

John Chase

unread,
Dec 1, 2015, 4:23:49 PM12/1/15
to Qiime 1 Forum
Hi, 

It is worth mentioning that PCA is a special case of PCoA where the distances are euclidean. In QIIME if you run beta_diversity.py with a euclidean metric and then principle_coordinates.py on the resulting distance matrix that will be equivalent to PCA. There is a good explanation here http://mb3is.megx.net/gustame/dissimilarity-based-methods/principal-coordinates-analysis. And a forum post about it here: https://groups.google.com/d/msg/qiime-forum/ruKHPgdJ8Q4/itfjbuOH7hYJ.

John

Colin Brislawn

unread,
Dec 1, 2015, 4:46:43 PM12/1/15
to qiime...@googlegroups.com
OK. So when the distance metric used is Euclidean, PCoA is equivalent to PCA?

If that's correct, could you use something like this?
ordu = ordinate(phyloseq_object, "PCoA", "euclidean")

Distances supported by phyloseq and vegan are listed here:

John Chase

unread,
Dec 1, 2015, 4:52:21 PM12/1/15
to Qiime 1 Forum
That is my understanding, however with that said I am unfamiliar with what phyloseq is doing under the hood, so I can't guarantee you will see the identical results. The last response of the forum post I linked to mentions some places where differences may arise. I would be curious to hear the results if you were to perform the two side by side.

Peter Atanackov

unread,
Dec 1, 2015, 5:04:12 PM12/1/15
to qiime...@googlegroups.com
I need to do it in accordance to a certain protocol, so I need specifically an Unifrac PCA plot and a Jaccard NMDS plot, that is why the Euclidean matrix is sadly not an option. 

And this is the problem I've been facing, nothing does the Unifrac PCA combo by itself, I'll probably try to import the matrix from Qiime in to vegan, if I'm successful I'll write a small tutorial here because at least someone out there has to be in the same dilemma (ofcourse if anybody has solved this already I would love to hear about it). 

Otherwise I would have done like you suggested Colin, used a different method, RDA and MDS should produce very similar results to PCA (or so I've read...) still the tutorial was very useful because of the PERMANOVA analysis, I didn't know how to do that before, and it will be very useful. 

Colin Brislawn

unread,
Dec 1, 2015, 6:10:04 PM12/1/15
to Qiime 1 Forum
Hello Peter,

I'm glad you found adonis useful. It's a really great method to use after you see clear clustering in an ordination.

Sorry to keep pushing you on this, but can you direct me to this protocol which recommends unifrac followed by PCA (instead of PCoA)? I'm kind of surprised by this because the folks responsible for the UniFrac metric usually use PCoA to visualize the resulting distances. I'm curious about the specific reasons these authors recommend UniFrac + PCA and Jaccard + NMDS. 

Colin

John Chase

unread,
Dec 1, 2015, 6:10:18 PM12/1/15
to Qiime 1 Forum
Hello, 

I don't believe PCA is possible without euclidean distances, and PCoA with euclidean distances ends up being equivalent to PCA. There is an extensive discussion on stack exchange about this same question. If I am mistaken I apologize, I will continue to investigate possibilities. 

John

Colin Brislawn

unread,
Dec 1, 2015, 6:34:28 PM12/1/15
to Qiime 1 Forum
Hello John, Peter,


The following article includes a comparison of PCA and PCoA. 

This paragraph is interesting.
The PCA procedure basically calculates new synthetic variables (principal components), which are linear combinations of the original variables (for instance, the species of a sample-by-species table), and that account for as much of the variance of the original data as possible (Hotelling, 1933). 
So PCA is performed on the OTU counts in the OTU table (and not on UniFrac distances). 

I need specifically an Unifrac PCA plot.
Maybe the author of your protocol meant to say PCoA...? Once you calculate UniFrac distances, you are not doing PCA anymore.


Keep us posted!
Colin

Peter Atanackov

unread,
Dec 2, 2015, 12:19:47 PM12/2/15
to Qiime 1 Forum
Sorry Colin, John, for taking this long to answer, it is the earliest I could. 

Regarding the protocol I can't really show it sadly, otherwise it's pretty underwhelming in real life, but I did do the next best thing, I dug up a few journal papers in where the authors used PCA ordination in combination with the Unifrac distance matrix.

http://onlinelibrary.wiley.com/doi/10.1002/hep.24423/pdf

This is really interesting, since your logic also concurs with the fact that neither Qiime, vegan or phyloseq can perform the Unifrac matrix, PCA ordination combo.

Colin Brislawn

unread,
Dec 2, 2015, 1:47:07 PM12/2/15
to Qiime 1 Forum
Hello Peter,

These have been great questions! This gives me a good excuse to go read up on all these ordination methods used in our field.

(I'm plenty familiar with messy protocols. You don't have to share them if you don't want to!)

I've pulled quotes from the three papers here:
  1. The UniFrac Web application also permits many samples to be compared simultaneously using principal component analysis (PCA). Collection and rarefaction curves and richness observed were determined by the MOTHUR package18 using the distance matrices as input.
  2. A. Principal coordinates analysis (PCA) ordination of the bacterial communities from 15 north temperate lakes. The PCA was created using Unifrac distances (see Experimental procedures).
  3. Pyrosequencing data from every sample was pooled to define OTUs in order to conduct UniFrac principal component analysis (PCA) and partial least square discriminate analysis (PLS-DA).
Yeah... they are just misusing the term. As we have been discussing here, there is a difference between PCA and PCoA, but it looks like people are using the terms interchangeably. In all these cases, they really did PCoA (because PCA is only applicable to counts, not to UniFrac distances)

Colin

Peter Atanackov

unread,
Dec 2, 2015, 6:43:12 PM12/2/15
to qiime...@googlegroups.com
Hello Colin, 

great stuff, this has made things a lot clearer. It is still strange to me that editors in such well respected journals aren't more strict about the proper use of these terms.

Unrelated: I was wondering, for NMDS ordination it is normal that results vary a bit among runs, is that also true for PCoA? Mine do. 

Thanks again for all the help! I hope I'll be able to return the favour some day. Peter 

Colin Brislawn

unread,
Dec 2, 2015, 7:14:56 PM12/2/15
to Qiime 1 Forum
Hello Peter,

haha yeah, there are not many statisticians or mathematicians in this field. It's also an easy mistake to make, because PCA is a special case of PCoA (PCA == PCoA of Euclidean distances).

How are you running NMDS? I'm not sure if this method is deterministic (always ends the same way) or not. It may depend on the implementation in the program you are using. 

Glad to be able to help! Let us know if you have any other questions,
Colin

Reply all
Reply to author
Forward
0 new messages