Alpha Rarefaction

323 views
Skip to first unread message

DoH

unread,
Jul 26, 2014, 10:50:05 AM7/26/14
to qiime...@googlegroups.com
Hi,

I am running the core_diversity.py script, that generates a rarefaction curve but it's strange as it's not smooth (as in attachment - photo). I do know what I could be doing wrong. I looked at the OTU summary table in order to pick the subsampling length (see attachment). In the parameter file (-p), I chose a sequencing depth of 970 (for single rarefaction). For multiple rafactions, I used the following parameters: 10 (min), 50000 (max), and 10 (num-reps).

I realize that the 970 for single rarefaction is way too low, so is there a way of letting the script to choose maybe 75% of the reads per sample?

Harris.
observed_speciesSampleID.png
rich.sparse_biom_table.txt

Will Van Treuren

unread,
Jul 26, 2014, 12:36:29 PM7/26/14
to qiime...@googlegroups.com
Hi Harris, 

The lack of smoothness comes from the fact that your y-scale is limited; you have a maximum of ~100 observed species, so even a fluctuation of 1 or 2 is going to be a noticeable deviation from the curve. I am not sure what environment you are sequencing (or how you derived those sequences), but your number of observed species is very low for your number of reads. Do the taxonomies of those sequences make sense or are you getting a larger number of unidentified things?

 For multiple rafactions, I used the following parameters: 10 (min), 50000 (max), and 10 (num-reps).
The graph suggests that there are far more than 10 reps (each deviation from the 'trend' line). Can you send the exact command you used for the core_diversity.py script so we can check whats going on?

I realize that the 970 for single rarefaction is way too low, so is there a way of letting the script to choose maybe 75% of the reads per sample?
I don't believe there is a numeric way to do this, or a way to force the script to bypass rarefaction. You could run all the analyses that core_diversity.py is doing separately using a biom table that is unrarefied. To get the 75% reads for each sample however, would require a tiny bit of custom scripting. 

Best,
Will 




--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

DoH

unread,
Jul 26, 2014, 1:19:17 PM7/26/14
to qiime...@googlegroups.com

Hi Will,

Thanks for the response. I am sequencing from an environment that I expect to get maybe up to 300 observed species (from literature). I have combined UPARSE and QIIME. The taxonomy makes sense, but when I use QIIME alone, the taxonomies are not fully resolved to the genus level.

I’m running both UPARSE and QIIME on cluster. This is the command for core diversity analyses:

core_diversity_analyses.py -i ../harris/rdp_assigned_taxonomy/table_tax.biom -m ../harris/mapping/file/mapping_file.txt -t ../harris/rep_set.tre -e 970 -c drug,diet -o core_diversity_analyses -p ../harris/qiime_parameters.txt

In the qiime_parameters.txt, I have the parameters that I had previously mentioned. They are defined as:

Single_rarefaction:depth             970

Multiple_rarefactions:min          10

Multiple_rarefactions:max         50000

Multiple_rarefactions:step         1000

Multiple_rarefactions:num_reps             10

 So do you advise me to use a value less than 10 (for reps)?

When you say “You could run all the analyses that core_diversity.py is doing separately using a biom table that is unrarefied”, do you mean I do not perform rarefaction? Is these better compared to rarefied biom table?

Harris

Will Van Treuren

unread,
Jul 27, 2014, 7:15:43 PM7/27/14
to qiime...@googlegroups.com

Hi Harris,
I misread, your number of reps is very reasonable. You are going to get jagged curves becauseof the low number of species regardless of your number of reps, but that isn't a problem.

As far as the question of rarefaction goes: rarefaction is important for some analyses and less so for others. I was suggesting that you might want to try things like group_significance.py and weighted UniFrac analysis without rarefying the table and see if you get the same results. Given that you have up to 40000 reads in some samples, 970 is pretty low (as you point out). There is discussion over when rarefaction should be used. For more info look at a recent paper by Susan Holmes and Paul McMurdy. They suggest that rarefaction is conservative (i.e. less rejection of the null hypothesis than you would want). Our experience suggests rarefaction is critical with unweighted UniFrac and alpha diversity calculations.

Best,
Will

DoH

unread,
Jul 28, 2014, 4:32:22 AM7/28/14
to qiime...@googlegroups.com
Thanks Will.
Reply all
Reply to author
Forward
0 new messages