Understanding qiime beta diversity results, PCOA plot, beta diversity matrix

julie smith

unread,

Aug 31, 2015, 7:51:38 PM8/31/15

to qiime...@googlegroups.com

Hi,

I am very new to qiime and metagenomics field and need help to understand the results/plots that I have generated by using qiime.

I have run beta diversity workflow script on my input biom file. There are 4 different metagenomics bacterial communities and each community has around 20 species. I didn't group the 4 metagenomics communities with each other, i just wanted to see the differences among these 4 communities. These 4 communities are labelled as Sample1, Sample2, Sample3, Sample4 in the emperor plots/screenshots attached.

I decided to do a beta diversity analysis in these 4 samples using qiime. After installing qiime, I prepared the input files by studying the tutorials and examples given on the qiime website which was very helpful.

Attached is the input OTU txt file that I prepared and its biom version. The numbers in the OTU file represent the number of read per species in each sample. I ran the beta diversity through plots workflow:

http://qiime.org/scripts/beta_diversity_through_plots.html

beta_diversity_through_plots.py -i OTU.biom -m map.txt -p Parameter.txt -o output -f

I am also pasting the commands that were run in the below. My questions are listed below. I would appreciate any feedback/comments.

1. What does the PCOA plot(attached) tell about these 4 samples (attached plot)?

2. What does the PCOA distances tell about these samples (attached file)?

3. How do I interpret the PC matrix? What does this tell (also file is attached)?

4. What do the numbers in the distance metric tell (attached)?

Basically, I currently need some help to understand how to interpret these results. I would very much appreciate any insights you could provide.

Thank you,

Julie

# Beta Diversity (euclidean) command

beta_diversity.py -i OTU_species.biom -o outputDir --metrics euclidean

Stdout:

Stderr:

/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2507: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.

VisibleDeprecationWarning)

# Rename distance matrix (euclidean) command

mv /euclidean_OTU_species.txt euclidean_dm.txt

Stdout:

Stderr:

# Principal coordinates (euclidean) command

principal_coordinates.py -i euclidean_dm.txt -o euclidean_pc.txt

Stdout:

Stderr:

/usr/local/lib/python2.7/dist-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:107: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -5.97764983814e-08 and the largest is 322496773.33.

RuntimeWarning

# Make emperor plots, euclidean) command

make_emperor.py -i euclidean_pc.txt -o /euclidean_emperor_pcoa_plot/ -m map.txt

Stdout:

Stderr:

Logging stopped at 16:18:16 on 25 Aug 2015

OTU_species.txt

OTU_species.biom.txt

euclidean_pc_table.txt

euclidean_dm.txt

map.txt

Parameter.txt

PCOAPlot_1.png

DistancePlot.png

Antonio González Peña

unread,

Sep 1, 2015, 8:20:02 AM9/1/15

to Qiime Forum

At this point the best you can do is to say that using Euclidean
distance (not sure why you decided to use Euclidean but suggest taking
a look at http://www.nature.com/nmeth/journal/v7/n10/abs/nmeth.1499.html)
on PC1 green and red are more similar than orange and blue, you could
do something similar with the other axes.

Anyway, this conclusion is something really basic and doesn't tell you
anything really interesting. The way to make it interesting is to
related it to something in your metadata, for example a treatment or
something that you are testing. Additionally, try to figure out if you
have specific taxa that makes this differences and even possibly test
if you could build a classifier to catch those differences. However,
to be honest, not sure how much you can do with only 4 samples, for a
lot of tests this is a really small number of samples.

Finally, if you want to learn more about this suggest taking at least
10 high profile papers that look into similar things you are looking
at in your study and go over the methods sections so you can
understand better the methodological decision they took. Note do not
have to be the same, just similar, for example: same questions in
another environment, same environment with other questions, etc.

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Qiime Forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to qiime-forum...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Antonio

julie smith

unread,

Sep 3, 2015, 2:20:38 PM9/3/15

to qiime...@googlegroups.com

OTU_species.txt

PCOAPlot_1.png

DistancePlot.png

OTU_species.biom.txt

euclidean_pc_table.txt

euclidean_dm.txt

map.txt

Parameter.txt

Antonio González Peña

unread,

Sep 3, 2015, 4:22:41 PM9/3/15

to Qiime Forum

These questions were already answered here:
https://groups.google.com/forum/#!topic/qiime-forum/oboEutau8YQ

julie smith

unread,

Sep 14, 2015, 4:48:15 PM9/14/15

to qiime...@googlegroups.com

OTU_species.txt

PCOAPlot_1.png

DistancePlot.png

OTU_species.biom.txt

euclidean_pc_table.txt

euclidean_dm.txt

map.txt

Parameter.txt

Reply all

Reply to author

Forward