Hi,
I am new to alpha diversity analysis and need help to understand how to interpret the alpha rarefaction plot and table generated by Qiime with my data. I would appreciate any feedback on this.
I performed alpha diversity analysis on my 1 sample. The corresponding OTU table (txt and biom) are attached. I am also attaching the rarefaction plot and the table generated by qiime for this sample.
What I noticed is that i see "nan" values in the table. Does this mean that something is wrong? I read in previous messages on this forum that I may have to rarify my biom file to your maximum depth if I want to get rid of nan values. Would this be the right approach? If so, what should be the max depth looking at my OTU table, how do i determine it?
OR
Would it be ok to skip the rarifying of the biom file step? Basically, would the results I currently have still make sense? How do I interpret these results?
Thank you very much in advance for your help and any guidance on how to interpret this data.
Julie
ps: These are the command that generated the attached plot and table from the attached OTU.biom file:
Executing commands.
# Alpha rarefaction command
multiple_rarefactions.py -i OTU.biom -m 10 -x 44087 -s 4407 -o /rarefaction/
Stdout:
Stderr:
# Alpha diversity on rarefied OTU tables command
alpha_diversity.py -i /rarefaction/ -o /alpha_div/ --metrics shannon,chao1,simpson,observed_species
Stdout:
Stderr:
# Collate alpha command
collate_alpha.py -i /alpha_div/ -o /alpha_div_collated/
Stdout:
Stderr:
# Removing intermediate files command
rm -r /rarefaction/ /alpha_div/
Stdout:
Stderr:
# Rarefaction plot: All metrics command
make_rarefaction_plots.py -i /alpha_div_collated/ -m /map.txt -o /alpha_rarefaction_plots/
Stdout:
Stderr:
Good morning Colin. First of all, thank you very much for your detailed and immediate response. I really appreciate your time and suggestions.
I will first also to answer your questions in your response to me and I still have a few more follow-up questions if you don't mind. I am very new to the area, so please excuse my ignorance in this topic. Below are the responses, I numbered them to make it easier to follow-up:
1. "I see "nan" values in the table" When qiime calculates the alpha diversity, some of the metrics have the potential to divide by zero metrics for a given sample. When a formula divides by zero, 'nan' is returned. I'm pretty sure this is related to analyzing one sample. (Do you have more than one sample?)
My answer: I have only 1 sample. What I mean by 1 sample is: This 1 sample is actually "a mock bacterial community" which has 19 bacterial species in it as you can see in the OTU.biom table that I had sent in my first email as attachment. My goal for using the qiime alpha diversity script was to measure to species richness in this 1 mock bacterial community. Therefore, I ran the qiime alpha diversity commands on the OUT.biom table and got the plot and table I had attached in my first email. Does this approach make sense? Also, could you please help me to understand what the plot means and the values in the table mean? For example, the plot rises and then stays at same level. How do I interpret this? Also, in the results table, For example, I see that observed species ave = 6.5 when seq/sample is 10 and then it is 19 or 18 for the rest seq/samples. What do these numbers tell about this community?
2. "I performed alpha diversity analysis on my 1 sample."
I'm not sure that will help you very much. When I perform alpha diversity analysis, I use many samples and compare the alpha diversity between groups. For example, my results could tell me that samples in the group "no antibiotics" are more diverse than samples in the group "many antibiotics".
Your results, after running this one one samples, tell me you have 19 OTUs in that sample.(How many reads are in this sample? Do your other samples have more reads?)
My answer: what I mean by 1 sample, is actually 1 bacterial community. As you also had noticed, this sample has 19 OTUs which is represented by the OUT.biom table that I had attached in my first email. So, this OTU.biom is the only input I have to the qiime alpha diversity analysis. The numbers represent the number of reads of each OUT in the given mock community sample. Does this clarify your question?
3. "Would it be ok to skip the rarifying of the biom file step?"
I want to make sure we are talking about the same step: 'alpha rarefaction', which is used to estimate alpha diversity, is often confused with 'rarifying', which is a normalization method. (The names are way too similar and this is a common mistake.)
To answer your question: When estimating alpha diversity, you should use a non-rarified .biom table. When calculating beta diversity, using a .biom table what has been normalized is probably a good option. To normalize, you could use normalize_table.py or single_rarefaction.py.
My answer: Thank you very much for your explanation. For alpha diversity analysis, I use the alpha_rarefaction.py script (http://qiime.org/scripts/alpha_rarefaction.html)
For beta diversity: I did beta diversity analysis on another biom file which has 4 samples (4 bacterial communities). I am attaching the OTU_todoBetaDiversity.biom file to this email. For beta diversity analysis, I am using the following qiime script: http://qiime.org/scripts/beta_diversity_through_plots.html . Is it possible that this script does the normalization, or do I have to normalize the data before giving it as an input to this script?
Thank you very much for all your help,
Julie
ps: This is the command I ran in order to do beta diversity analysis.
# Beta Diversity (euclidean) command
beta_diversity.py -i OTU_todoBetaDiversity.biom -o outputDirectory --metrics euclidean
--
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/YWXFEYUeKfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Take your full community. Let's say this community contains 200 microbes.Subsample at different read depths, from very low to very high.Plot this, with sample depth on the x axis, and microbial richness on the y axis.10 reads = 7 microbes100 reads = 65 microbes1000 reads = 187 microbes10,000 reads = 199 microbes100,000 reads = 199 microbes
multiple_rarefactions.py -i OTU.biom -m 50 -x 2000 -s 100 -o /rarefaction_mod
--
I had a hunch that the 44087 could be the number of total reads in the OTU input file
Could you please let me know where the 44087 and 4407 numbers came from?
Also, how does the number 4407 gets calculated for -s parameter and where does it come from?
--
how qiime comes up with the number 4407 for the -s parameter?
-n, --num_steps
Number of steps (or rarefied OTU table sizes) to make between min and max counts [default: 10]
--min_rare_depth
The lower limit of rarefaction depths [default: 10]
-e, --max_rare_depth
The upper limit of rarefaction depths [default: median sequence/sample count]
4. What do the numbers in the distance metric tell (attached)?
1. What does the PCOA plot(attached) tell about these 4 samples (attached plot)?
2. What does the PCOA distances tell about these samples (attached file)?
3. How do I interpret the PC matrix? What does this tell (also file is attached)?