Observed species vs Shannon index (alpha rarefaction

Jini

unread,

Nov 19, 2014, 9:55:12 AM11/19/14

to qiime...@googlegroups.com

Hi Qiime group,

I'm still new to all this and have (what I believe is) a simple question.

I've gone through the 454 QIIME tutorial and generated alpha rarefaction curves using the command "alpha_rarefaction.py".

When I look at the generated curves (rarefaction_plots.html) and select "shannon", all my samples plateau out after a certain number of sequences (after about 200). When I select "observed species" instead of "shannon" (which, I'm not mistaken, means number of OTUs), the curves don't plateau out. Even at the max sampling size, they seem like they may be increasing. I set the maximum depth as the number of sequences in my sample with the least sequences. Why are the curves different when you select shannon vs when you select OTUs?

Hope someone can help me out!

Thanks

Jini

Will Van Treuren

unread,

Nov 21, 2014, 8:24:36 PM11/21/14

to qiime...@googlegroups.com

Hi Jini,

The reason you are seeing that is because of the mathematical forms of the different measures. The Shannon index is calculated as - sum(pi * ln (pi)) where pi is the proportion of the ith bug in the sample and ln is the natural log. As you increase your rarefaction depth, your pi's are stabilizing. Imagine we had a community like the following:

Sample 1 = [1000, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Where we have 1000 counts of bug 1, 10 of bug 2, 1 of bug 3, etc. If I rarefy at 100 counts, I am likely to get a lot of bug 1, and perhaps a couple of bug 2, and maybe 1 of the other bugs. If I rarefy at 200 counts, I'll get very similar results, even though I'll probably get a couple more of the 1-count bugs. What this means is that the Shannon diversity measure is pretty stable; if you add low proportion bugs, you don't change the score that much. This makes intuitive sense from the perspective of what Shannon is measuring - namely entropy - because if the sample is predominantly 1 bug, and has a couple others at low abundance, we have very low entropy. We could predict the next member of the sequence (the next draw from the pool) with high accuracy by virtue of there being so many more of bug 1 than anything else. As an example rarefaction, lets assume I drew:

r1 = [97, 2, 0, 1]

shannon = .97 * ln(.97) + .02 * ln(.02) + .01 * ln(.01) = 0.1538

Now, if I happened to find one more bug, i.e. I got

r2 = [96, 2, 1, 1]

shannon = .96 * ln(.96) + .02 * ln(.02) + .01 * ln(.01) + .01 * ln(.01) = 0.2095

See how little Shannon has changed, even though I have found another new bug?

In contrast, observed species will continue to increase as you increase rarefaction depth, because its merely the number of unique things you observe. So even if you have only 1 in 100,000 of some bug, a deep enough rarefaction will find it and the will cause the number of observed species to rise by 1. So, in our example:

r1 = [97, 2, 0, 1]

observed_species = 3

Now, if I happened to find one more bug, i.e. I got

r2 = [96, 2, 1, 1]

observed_species = 4

The proportion of the change in the metrics is similar, but the absolute change for Shannon is much smaller, and if you were to do this example with a larger vector, you'd see a much more pronounced difference.

Hope this helps,

Will

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ji Pa

unread,

Nov 25, 2014, 12:24:23 PM11/25/14

to qiime...@googlegroups.com

Thank you Will, yes, this definitely helps.

--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/hVsX_jplACg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Reply all

Reply to author

Forward