Help Determining Factors Loaded on PCoA Axes (PC1, PC2, PC3)

465 views
Skip to first unread message

brett...@gmail.com

unread,
May 20, 2015, 12:00:24 PM5/20/15
to qiime...@googlegroups.com
Hello!

I am using QIIME 1.9.0 and used core_diversity_analyses.py to perform alpha and beta diversity tests and generate Emperor 3D plots all at once. I only used bacterial OTUs and no meta data in these analyses.

In my weighted UniFrac Emperor plot, I see a lot of separation along the PC1 axis (representing 37.24% of the variation) which is very interesting to me, of course.

However, despite looking through the forum for an answer to this, I cannot figure out how I would determine which OTUs loaded on PC1 (or PC2/3 for that matter). Is there a file generated by core_diversity_analyses.py (or any other script) that can show me which OTUs make up each PC and what magnitude of contribution they make? I would prefer a file that spells this out clearly without having to do a lot of manipulation since I'm still pretty new to the UNIX environment, but I realize that this may not be possible.

Hopefully I'm not overlooking anything simple. Thanks in advance!
Brett

Colin Brislawn

unread,
May 20, 2015, 1:45:38 PM5/20/15
to qiime...@googlegroups.com
Hello Brett,

Yes, there should be a file like that, although the formatting may be tricky...

When running core_diversity_analyses.py, your output folder should contain many intermediate files and folder used to make the final plots. Take a look through the beta diversity folder for a collection of unifrac distance matrices.

If you want to copy/paste your folder structure, I can help you soft through them.

Colin
Message has been deleted

brett...@gmail.com

unread,
May 20, 2015, 2:30:14 PM5/20/15
to qiime...@googlegroups.com

Thanks Colin,


I'm not sure if there is an easier way to show the file structure, but this is what figured out:


Bretts-MBP:AllSample_CDA_15k bloman2$ ls

arare_max15000 biom_table_summary.txt log_20150520112823.txt table_mc15000.biom.gz

bdiv_even15000 index.html table_even15000.biom.gz taxa_plots

Bretts-MBP:AllSample_CDA_15k bloman2$ cd bdiv_even15000

Bretts-MBP:bdiv_even15000 bloman2$ ls

unweighted_unifrac_dm.txt weighted_unifrac_dm.txt

unweighted_unifrac_emperor_pcoa_plot weighted_unifrac_emperor_pcoa_plot

unweighted_unifrac_pc.txt weighted_unifrac_pc.txt

Bretts-MBP:bdiv_even15000 bloman2$ cd weighted_unifrac_emperor_pcoa_plot

Bretts-MBP:weighted_unifrac_emperor_pcoa_plot bloman2$ ls

emperor_required_resources index.html



Please let me know if there's an easier way of if this is enough information. Are we looking for a .txt file, or what?

Thanks again,
Brett

Colin Brislawn

unread,
May 20, 2015, 3:03:06 PM5/20/15
to qiime...@googlegroups.com
Haha, yeah recreating folder structures is a mess, but don't worry, I found them.

The files you want are:
unweighted_unifrac_dm.txt weighted_unifrac_dm.txt
unweighted_unifrac_pc.txt weighted_unifrac_pc.txt

The top to dm.txt ones are the unifrac distances.
The bottom two pc.txt ones are the principal coordinates you asked about.


I hope that helps!
Colin

brett...@gmail.com

unread,
May 20, 2015, 3:39:02 PM5/20/15
to qiime...@googlegroups.com
Okay, I was looking at these files before and couldn't really make sense of them beyond understanding what the corresponding "Eigenvalues" and "Proportion Explained" values meant.

The next titled row down "Species" has a completely blank row underneath it (unlike "eigenvalues" and "proportion explained", which have values in the row underneath them). Is this where the corresponding OTUs are supposed to be? I attached the file so that you can see exactly what I mean...

Brett
weighted_unifrac_pc.txt

Jai Ram Rideout

unread,
May 20, 2015, 4:35:51 PM5/20/15
to qiime...@googlegroups.com
Hi Brett,

I don't think there's an easy way to do this in QIIME/Emperor. There's an open feature request on the QIIME issue tracker that provides more details and discussion. You might consider following up there to indicate that you'd find this a useful feature.

In the meantime, the discussion on that issue indicates that Emperor's biplots may be an alternative. For an example of what these look like, click on "Biplots Example" at the bottom of Emperor's homepage. make_emperor.py's --biplot_fp option may also be of interest to you.

Best,
Jai

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jincheng.wang1986

unread,
May 20, 2015, 4:51:15 PM5/20/15
to qiime...@googlegroups.com
Just my few thoughts.
The PCoA (Principle Coordinate Analysis) plot produced in QIIME is not Principle Component Analysis (PCA) plot, although they are similar. The PCoA plot in QIIME (or in general) is based on the distance matrix; in this case the Unifrac distances between pairs of samples. While the PCA plot is based on multiple dimension of information a sample has (such as counts of OTU1, OTU2, OTU3,.... of a sample), in which you could calculate the dimension (factors) load for each axis. But you cannot do that in PCoA because the distance matrix that PCoA is based on do not necessary have the information on OTU1, OTU2, OTU3..... this sort of things on each sample. I am not an expert on this but these are something what I understood based on my research on this topic.
Although the biplot may be similar as suggested by Jai, but actually the coordinates of the OTU you will plot on your plots are from the weighted eigenvectors (you can consider eigenvectors as unit, direction vectors), given that I am not exactly sure what these information tells me.
Hope this helps.
Reply all
Reply to author
Forward
0 new messages