Should I be rarefying my data?

5,616 views
Skip to first unread message

jenks

unread,
Apr 7, 2014, 6:32:38 AM4/7/14
to qiime...@googlegroups.com
Hi All,

I've just read Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible by McMudie and Holmes (2014). You can find it here:

http://arxiv.org/abs/1310.0424

Everything I've read in the literature has suggested that rarefaction was standard methodology, but this paper suggests that it is inappropriate and that other normalization methods should be used. I'm not familiar with the statistics mentioned in the paper, and wondered what other people think. Is this something to be worried about?

Thanks,

Joe

Kyle Bittinger

unread,
Apr 7, 2014, 9:00:56 AM4/7/14
to qiime...@googlegroups.com
This is indeed a thought-provoking paper.  I bet that some of the other QIIME users and developers may comment on this.

Although I am the forum master this week and should really be answering questions rather than asking, I can't help but ask this to the forum:

For the last analysis you worked on, would your major conclusions have been different if you rarefied the OTU table vs. just eliminated samples with a tiny number of reads (maybe < 200-500 reads/sample)?

Best,
Kyle


--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carly

unread,
Apr 28, 2014, 3:35:45 PM4/28/14
to qiime...@googlegroups.com
Hi Joe,

I am a Ph.D. looking at salamander skin microbial communities across three species of salamanders. I used 454 to look at the entire skin bacterial community from skin swabs. My rarefaction plots indicate that at ~750 sequences/sample I am reaching the asymptote -- so more or less I have sampled the entire community. I have done analyses with the rarified dataset (lowest sample has 660 sequences - so rarified there) and a normalized dataset. I have also done analyses using both uclust (default in QIIME) and uparse to cluster OTUs. More on using Uparse found here:https://groups.google.com/forum/#!searchin/qiime-forum/uparse/qiime-forum/zqmvpnZe26g/1F6LbQMkjToJ

In the end, I have compared alpha and beta-diversity and abundance estimates across uclust/rarified, uparse/rarified, uclust/normalized, uparse/normalized. What I have found is that my alpha diversity measures indicated a different story -- the number of bacterial species/salamander is different with uclust clustering, but not with uparse clustering regardless of rarified vs. normalized. Beta and abundance measures pretty much stay the same regardless. 

So, preliminarily it did not matter for my dataset whether I used rarified or normalized OTU table, but it did matter for alpha-diversity estimates whether I used uclust versus uparse for my otu clustering method.  BUT, from my rarefaction plots I have, for the most part, sampled the entire community. Normalization is likely more important when you have not sampled the entire community. 

I see the argument in the paper you mention and have been encouraged by a committee member to use normalization as opposed to rarefaction.

I used MetagenomeSeq to normalize http://cbcb.umd.edu/software/metagenomeSeq

Here is the R code:

setwd("/Users/Carly/Desktop")

library("metagenomeSeq")
library(biom)
mybiom = load_biom("otu_table.biom")

mybiom

p = cumNormStatFast(mybiom)

mybiom = cumNorm(mybiom, p = p)
head(normFactors(mybiom))

normFactors(mybiom)
normmybiom <- MRcounts(mybiom, norm = T)
write.csv(normmybiom, file = "norm_biom.csv")

Then you just convert the .csv back into a .biom file (might have to save as a .txt file first or just save as a .txt file in R, but make sure you have the columns properly delimited).

Again, this is preliminary data analyses with new methods for me so I will keep you posted when I go over this dataset again in the future. 

Cheers,

Carly

Joe

unread,
Jun 11, 2014, 11:02:46 AM6/11/14
to qiime...@googlegroups.com
It's important to note I think that the methods like metagenomeSeq, edgeR, etc were not developed with rarefied data in mind. 

Despite the common trend in the field to rarefy it's important to note that McMurdie and Holmes warn that rarefaction can lead to 1) potentially poorer global community diversity estimates and 2) loss of statistical power regardless of the method employed.

I don't think like what Carly states though is valid - that by using rarefaction plots (having an intuition about the community being completely sequenced or not) is an argument for rarefaction.


On Monday, April 7, 2014 6:32:38 AM UTC-4, jenks wrote:

Colin Brislawn

unread,
Jun 12, 2014, 11:22:47 AM6/12/14
to qiime...@googlegroups.com
I spoke with Paul McMurdie at ASM, who is one of the main developers of the phyloseq package. He made a really important distinction, which I want to reiterate here.

'Rarefying' is not the same as 'rarefaction.'

Rarefaction is when you sample without replacement at many different sequencing depths. (multiple_rarefactions.py) This is what is use for alpha diversity, and this is totally statistically valid.
Rarefying is when you down-sample every sample to an even sequencing depth. (single_rarefaction.py) This is how the qiime devs normalize for sampling depth, and this is not statistically valid. Removing data reduces your ability to detect differences between populations.

Here is what that qiime developers recommend:
-pick otus (ucluster or userach)
-remove chimeras
-single rarefy
-weighted or unweighted unifrac
-beta diversity (PCoA plots)

Here is what the Waste Not paper recomends
-pick otus (uparse)
-remove chimeras
-normalize for depth using their metrics
-weighted unifrac or another metric
-beta diversity (PCoA plots)


I have highlighted the differences between the methods.
The main differences are
-OTU picking algroithm (uparse over uclust)
-using their depth normalization method
-not using unweighted unifrac


I really appreciate the excellent discussion going on here, (especially from Carly), I just wanted to make sure that we don't confuse rarefaction (OK) with rarefying (criticized).


Has anyone here used the pholoseq package? How did it go?

Colin Brislawn



On Monday, April 7, 2014 6:32:38 AM UTC-4, jenks wrote:

Matthew Stoll

unread,
Jun 12, 2014, 3:40:40 PM6/12/14
to qiime...@googlegroups.com
So, what should you do if one sample has 15000 sequences, and another has 80000? If you don't sample at the lowest common denominator (15000 in this case), won't you end up with skewed estimates of alpha and beta diversity?

Thanks,

Matt

Sent from Yahoo Mail for iPhone


From: Colin Brislawn <cbr...@gmail.com>;
To: <qiime...@googlegroups.com>;
Subject: [qiime-forum 1.8.0] Re: Should I be rarefying my data?
Sent: Thu, Jun 12, 2014 3:22:47 PM

Colin Brislawn

unread,
Jun 12, 2014, 4:44:27 PM6/12/14
to qiime...@googlegroups.com
I'm not entirely sure. Our lab is getting ready to implement phyloseq and benchmark it against qiime's metrics.

For alpha diversity, nothing changes. You use parallel_multiple_rarefactions.py to do rarefactions from 1k to 15k and plot the curves (maybe with a step size of 1k and 50 iterations at each step), just like normal. Computing multiple rarefactions is not criticized by this paper.

This finding changes the way we approach beta diversity. As I mentioned above, we used to single rarefy then run beta_diversity_through_plots.py to make PCoA plots of unifrac distances (weighted and unweighted). The process of rarefying is criticized by this paper, so for the time being we are feeding our normal .biom table into beta_diversity_through_plots.py using only weighted unifrac. The Waste Not paper shows that weighted unifrac works OK with unrarefied data (labeled as 'proportional' in figure 4).

What's your normalization strategy?

Colin






--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/g-qnFj5e96U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Matthew Stoll

unread,
Jun 12, 2014, 7:26:01 PM6/12/14
to qiime...@googlegroups.com
I would do what you just described for alpha; for beta, I set the -e option to the lowest sequencing depth (excepting any really low ones). I just ordered some mock DNA samples, so I can try different approaches.

From: Colin Brislawn <cbr...@gmail.com>;
To: <qiime...@googlegroups.com>;
Subject: Re: [qiime-forum 1.8.0] Re: Should I be rarefying my data?
Sent: Thu, Jun 12, 2014 8:44:05 PM

Colin Brislawn

unread,
Jun 12, 2014, 8:29:49 PM6/12/14
to qiime...@googlegroups.com
Ah ok. We used to do that using single rarefy.

. I just ordered some mock DNA samples, so I can try different approaches.

Very cool! May I ask what different approaches will you test? Illumina vs Ion-Torrent? Uclust vs usearch vs UPARSE? We are about to start benchmarking using our preexisting data sets, so I'm really interested in your strategy.

Colin

Matthew Stoll

unread,
Jun 13, 2014, 9:53:54 AM6/13/14
to qiime...@googlegroups.com
Hi Colin,

Honestly, I haven't decided entirely on what approaches to compare, and will probably discuss with the bioinformatics folks here. We only have one machine (Illumina MiSeq), so that will have to be a constant. I am particularly interested in comparing QIIME pipelines with using UParse directly from Robert Edgar's pipeline to construct the OTU table. It probably makes sense, as you suggested, comparing different clustering algorithims within QIIME as well.

Matt
--------------------------------------------
On Thu, 6/12/14, Colin Brislawn <cbr...@gmail.com> wrote:

Subject: Re: [qiime-forum 1.8.0] Re: Should I be rarefying my data?
To: qiime...@googlegroups.com
Date: Thursday, June 12, 2014, 8:29 PM

Paul Joseph McMurdie

unread,
Jul 10, 2014, 5:14:43 PM7/10/14
to qiime...@googlegroups.com
Hi all,

Carly, I want to rephrase what Joe and Colin have pointed out, and also clarify a couple of things that you've touched on. You stated,

"So, preliminarily it did not matter for my dataset whether I used rarified or normalized OTU table"

However, you have not used any objective criteria to infer this, and you do not know the "true" answer because you are comparing the results of exploratory methods on real data. This is why we emphasized mathematical results and simulations in the "Waste Not Want Not" article. We knew the correct answer, and so we could evaluate not only when rarefying had a negative impact on results, but how badly it affected sensitivity and specificity of downstream methods. It might be that the exploratory methods you are using are not showing you any detectable difference with and without rarefying, but it is very important to understand that this depends on (1) the sensitivity of the method(s) you are using, and (2) the effect size of the phenomena you are attempting to observe. For example, if you are comparing the microbiota of the open ocean with human feces, it doesn't matter what method you use because the differences are so large. Your own data might also have very large differences, and you can be as inefficient in the analyses as you like and still "see" the major trends. 

So, to be clear, the argument against rarefying is not that it mangles the data so badly that nothing useful can ever be achieved afterward. Many important observations have been made with analyses that used rarefying. However, by definition, these results were not as precise as they could have been, given the available data. This is a consequence of the statistical concept called "admissibility". Moreover, for datasets in which the effects are subtle, the inferences based on rarefied might be incomplete, or wrong. By definition, when you're using something like PCoA you are performing an exploratory analysis, and probably do not know the 'true" answer, or how subtle the effects are you are attempting to detect. Unless you already know the correct answer (simulated data, control experiments, etc.), you won't know whether or not you needed that data you threw away during rarefying. 

Does that make sense?

An important caveat. For differential abundance testing, you are no longer doing exploratory analyses. There are clear hypotheses being tested, usually many (often one per OTU). In this setting, you are asking a lot more of the data, especially for the lower-abundance OTUs. Most of the scenarios we've simulated/tested in this setting are strongly negatively affected by rarefying. This is a setting where parametric methods like edgeR, DESeq2, metagenomeSeq, etc., could really help. Also, well-calibrated non-parametric methods could also perform well in some of these settings, but standardizing sample size by rarefying (throwing out data) is not among these.

Hope that helps!

Paul McMurdie

Paul Joseph McMurdie

unread,
Jul 10, 2014, 5:30:58 PM7/10/14
to qiime...@googlegroups.com
Matt,

Unfortunately there has been a lot of misinformation spread around on this topic. The problem of having unequal sample sizes is not new or unsolved in statistics. Throwing away data from the larger sample(s) is a crude approach, at best. The alpha-diversity field has been thinking about this with a bit more mathematical/statistical rigor for a long time, and they don't recommend rarefying. As you might guess, the larger the library size, the "better" your estimate is of alpha-diversity. You can try this for yourself, by inspecting how the confidence intervals shrink as library size increases. You can take it to the extreme (mathematicians love doing this) and imagine you have a magic DNA sequencer that provides a library of infinite size. But for only one sample per experiment. The other samples get normal (finite) sized libraries. What do you do? Randomly subsample from the infinite library, and then attempt to "estimate" the alpha diversity? In this case you could perfectly determine any alpha-diversity metric you want for that sample. Why would you give up that precision just because you have less precision for the other samples? This analogy also applies to beta diversity, and you can generalize this notion of differing uncertainty, differing "sample sizes", to any other analysis with varying consequences.

Hope that is helpful

Paul McMurdie
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum+unsubscribe@googlegroups.com.

Colin Brislawn

unread,
Jul 11, 2014, 12:07:46 PM7/11/14
to qiime...@googlegroups.com
Hey Paul,

The hypothetical example of infinite sequencing depth (which implies capturing total diversity) is a great example of why rarefying is guaranteed to reduce the strength of your analysis. I think I get it.

What I don't fully understand is how we should be proceed from here? An example:

We have a ground water project with 50ish samples, a full MiSeq run, and four sample types. Data quality is good, but sequencing depth is really uneven. Two sample types have an order of magnitude more sequences then the other two. Our challenge is to estimate the diversity of each of each sample type, when 90% of our data comes from half of the sample types.

Rarefaction curves from near zero to minimum sampling depth shows that the sample type "water" is move diverse then the other ones. It's higher in all richness metrics, and significantly higher using the chao1 metric. It's a cool finding! But the sample type "water" less deeply sequenced.
When we redo this analysis including the full data set, sample types with more sequences blow "water" out of the water. (Pun totally intended) This makes sense: more reads == more potential to observe the diversity present. But the initial rarefaction curves imply that we would still observe more diversity in water, if only we had adequate sequencing depth.

How do you advise we characterize alpha diversity when sequencing depth differs dramatically?


Help us Joey711!
Colin

Alex

unread,
Sep 8, 2014, 7:34:53 PM9/8/14
to qiime...@googlegroups.com
Hi everyone,

I am also curious about how to proceed from here. I am in a similar situation to Colin. I'm working on microbial data from stream sediment, and from previous studies expect that differences will be subtle or difficult to detect, so I really don't want to be discarding any relevant data. I have 512 samples and rather uneven sequencing depth ranging 110 to 8125. 
Min: 110.0
 Max: 8125.0
 Median: 1395.000
 Mean: 1848.688
 Std. dev.: 1559.232

The 110 is not ridiculously small as the sample numbers go up from there fairly evenly. Using 110 would lose a ridiculous amount of data, but using any other number seems arbitrary.

I unfortunately don't have a lot of time to play around with the data and different methods of analysis. Given the implications of the paper and dramatically differing sequencing depth, what is a sensible approach to measuring alpha and beta diversity in Qiime?

Thanks,
Alex

Colin Brislawn

unread,
Sep 8, 2014, 8:08:21 PM9/8/14
to qiime...@googlegroups.com
Hello Alex,

I still have not found an elegant or statistically defensible solution to this problem. It's a mess. 

For your data set in particular, I would be very concerned about your sequencing depth in all samples. Did you do all 512 samples on a single MiSeq run? Our projects may do 20 to 50 samples on a run. We throw away samples containing less then 1,000 reads, for environments with few or dilute microbes. For environments like stream sediment where more microbes are expected, we might throw away samples with less then 10,000 reads because our data set looks like this:

Num samples: 36
Num observations: 2378 (OTUs)
...
Counts/sample summary:
 Min: 3098.0
 Max: 158251.0
 Median: 30693.500
 Mean: 42584.778
 Std. dev.: 34727.174

Your data set is much smaller then that, increasing the need to retain all the data you have to maintain statistical power. 

For other forum members, how do you normalize sequencing depth? 
Do we have something more powerful then rarifying?

Colin



--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/g-qnFj5e96U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Alex

unread,
Sep 8, 2014, 8:39:26 PM9/8/14
to qiime...@googlegroups.com
Hi Colin,

Thanks for your reply. I guess we'll all be working on what to do with data like this.

In terms of sequencing, I am not sure how many sample were done per run, as we sent it to another lab to do. The 512 sample do come from different batches that we sent to the lab, so they would not have been done all together. The samples were from ephemeral streams with sandy sediment and a number of the samples failed at the DNA extraction step as not enough DNA could be extracted. It is therefore at least plausible that there really wasn't much life there. I will have to go and check sample dates and locations for those samples to make sure though. Regardless of that, I would be interested to know if anyone has a suggestion for dealing with the variability in sequence depth without losing data.

Alex

Carly

unread,
Feb 18, 2015, 9:50:56 PM2/18/15
to qiime...@googlegroups.com
Hi all,

I will have a brief reply here. I just found this post again. 

Few points: 

Yes, there is a difference between rarefaction and rarefying. Rarefying is throwing away data based on choosing the lowest sequencing depth and forcing all your samples to have the same sequencing depth, rarefaction is assessing how your richness estimate changes as you sample more deeply, either more sequences or more samples per treatment, species, etc. 

What I was saying is if your rarefaction plots level off at where you are rarefying at (say leveling off at 2,000 sequences and you rarefy to 2,000 sequences) then you are sampling most of the community. In that case, rarefying and normalization will show similar results. Am I totally wrong about this Paul?? 

And, yes, as Paul pointed out I am being objective. I am not writing a paper on this. Thank you Paul for writing one using a solid statistical framework. 

Paul, can you comment on -- is using normalization ok if you have say 6 samples: sample1 = 3,000 sequences, sample 2 = 30,000 sequences, sample 3 = 300,000 sequences, sample 4 = 3,000,000, sample 5 = 100,000, sample 6 = 1,000,000 with samples 1-3 coming from lake 1 and samples 4-6 coming from lake 2? Or maybe instead of ok is it better to use than rarefaction in making inference about alpha-diversity, beta-diversity and relative abundance differences between the two lakes? 

On R 

MetagenomeSeq for normalization is what I am using. Here is a recent post on github when I had an issue with doing the normalization https://github.com/nosson/metagenomeSeq/issues/16

I have used Phyloseq in R and it is quite useful in plotting data. I have used other R packages, and Primer-E as well as QIIME to complement one another. Primer-E however costs $$

For alpha-diversity I take my text file from 

alpha_diversity.py -i SMP_uparse_614_taxa.biom -m chao1,PD_whole_tree,observed_species,shannon -o adiv_614.txt -t SMP_uparse.tre

and use R to make plots, and analyze the data. More options in R with making figures.

For beta-diversity, I convert my .biom file into a site x species matrix and use vegan to do statistical analyses, with my mapping file being used as the environmental data file. Remember the site x species and environmental data file need to have the same sampleID order. I have also played around with using Primer-E. What I find is that all of these programs have little differences that change the p-values, r2-values more so than simple permutations could (being objective again). AND, you have to be careful and think about what you are doing to make sure it is not an oversight on your part (OR a developer, things happen!) versus a simple statistical preference such as using permutations of raw data  versus permutations of residuals (PERMDISP) I prefer to use both jaccard and unifrac distances, and use two programs to analyze my beta-diversity work to believe what I am finding. And, visualization of data is by far the most helpful in assessing if the stats are totally off. Phyloseq, using ggplot2, makes great beta-diversity plots. There are lots of other great things in Phyloseq too that help with data visualization.

For abundance, I have not used any other program other than QIIME, but am just getting into those official analyses. 

Cheers,

Carly
Reply all
Reply to author
Forward
0 new messages