Hi all,
I will have a brief reply here. I just found this post again.
Few points:
Yes, there is a difference between rarefaction and rarefying. Rarefying is throwing away data based on choosing the lowest sequencing depth and forcing all your samples to have the same sequencing depth, rarefaction is assessing how your richness estimate changes as you sample more deeply, either more sequences or more samples per treatment, species, etc.
What I was saying is if your rarefaction plots level off at where you are rarefying at (say leveling off at 2,000 sequences and you rarefy to 2,000 sequences) then you are sampling most of the community. In that case, rarefying and normalization will show similar results. Am I totally wrong about this Paul??
And, yes, as Paul pointed out I am being objective. I am not writing a paper on this. Thank you Paul for writing one using a solid statistical framework.
Paul, can you comment on -- is using normalization ok if you have say 6 samples: sample1 = 3,000 sequences, sample 2 = 30,000 sequences, sample 3 = 300,000 sequences, sample 4 = 3,000,000, sample 5 = 100,000, sample 6 = 1,000,000 with samples 1-3 coming from lake 1 and samples 4-6 coming from lake 2? Or maybe instead of ok is it better to use than rarefaction in making inference about alpha-diversity, beta-diversity and relative abundance differences between the two lakes?
On R
I have used Phyloseq in R and it is quite useful in plotting data. I have used other R packages, and Primer-E as well as QIIME to complement one another. Primer-E however costs $$
For alpha-diversity I take my text file from
alpha_diversity.py -i SMP_uparse_614_taxa.biom -m chao1,PD_whole_tree,observed_species,shannon -o adiv_614.txt -t SMP_uparse.tre
and use R to make plots, and analyze the data. More options in R with making figures.
For beta-diversity, I convert my .biom file into a site x species matrix and use vegan to do statistical analyses, with my mapping file being used as the environmental data file. Remember the site x species and environmental data file need to have the same sampleID order. I have also played around with using Primer-E. What I find is that all of these programs have little differences that change the p-values, r2-values more so than simple permutations could (being objective again). AND, you have to be careful and think about what you are doing to make sure it is not an oversight on your part (OR a developer, things happen!) versus a simple statistical preference such as using permutations of raw data versus permutations of residuals (PERMDISP) I prefer to use both jaccard and unifrac distances, and use two programs to analyze my beta-diversity work to believe what I am finding. And, visualization of data is by far the most helpful in assessing if the stats are totally off. Phyloseq, using ggplot2, makes great beta-diversity plots. There are lots of other great things in Phyloseq too that help with data visualization.
For abundance, I have not used any other program other than QIIME, but am just getting into those official analyses.
Cheers,
Carly