Correct normalization method to use ?

1,446 views
Skip to first unread message

Daiv9

unread,
Mar 16, 2016, 2:57:20 PM3/16/16
to Qiime 1 Forum
Hi All,

I have performed Mi-Seq sequencing on 40 samples. The total number of reads produced from each samples are from 75k to 428k (avg. 207k, std 87.3k). I have ran pick_open_reference_otus.py script. Do anybody has suggestions on what method i should use to normalize the biom file (matrix normalization or rarefaction) ? 

Best
Daiv

Embriette

unread,
Mar 17, 2016, 11:26:28 AM3/17/16
to Qiime 1 Forum
Hi Daiv,

You can just rarefy your OTU table once you have it. This can be done via single_rarefaction.py
You can also indicate a rarefaction depth on a non-rarefied OTU table when you run beta_diversity_through_plots.py, as well as when running core_diversity_analysis.py

Thanks!

Embriette

Roger Huerlimann

unread,
Mar 22, 2016, 2:58:22 AM3/22/16
to Qiime 1 Forum
What is currently more recommended?

Normalisation through rarefaction or through CSS/DESeq?  Are there cases when one is preferable to the other?

Kind regards,
Roger

Colin Brislawn

unread,
Mar 22, 2016, 1:25:09 PM3/22/16
to Qiime 1 Forum
Hello Daiv, Roger, others,

This is a really cool question, partly because it's hard and partly because it controversial. Normalization in microbiome studies is currently contested and the field has not arrived at a consensus. Reasonable scientists disagree. Here is my perspective on the history (which of course includes my bias on this subject).

  1. Normalization is always necessary.
    If you don't normalize, sequencing depth / reads per sample / sampling effort will change metrics of alpha and beta diversity. Samples with more reads will look more rich (alpha dev) and cluster together in a PCoA (beta). I would encourage you to try this on your own data sets, because this trend is always present and sometimes very strong. (You can add 'reads per sample' into your metadata file and color-code by this in Emperor. It's super easy to see.) (I also wrote up a toy example.)  
  2. Rarifying / subsampling is imperfect...
    When you subsample to an even depth, you are throwing away observations and this reduces your ability to detect differences in alpha and beta diversity. To me, this is intuitively true; less observations == less ability to tell things apart. But just in case it's not, McMurdie and Holmes prove mathematically then demonstrate computationally that rarifying is guaranteed to reduce resolution. 
    http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531 
    https://joey711.github.io/waste-not-supplemental/ 
  3. Rarifying / subsampling is imperfect... but good enough.
    So while this normalization method is guaranteed to reduce resolution, it's simple to understand. (Can you elegantly explain how quantile normalization or variance stabilization work? What do you do with the negative counts from variance stabilization?) In practice, it also works very well in many studies because the communities are different enough that you don't need to keep every read to distinguish them. Some folks argue that it's 'worth it' to lose this resolution because of the simplicity and clarity of this method. 
    https://peerj.com/preprints/1157/

Previously, rarifying / subsampling was the standard way to normalize. Since the proven imperfection of this method, the field has not formed a consensus on what to do. Many people are still rarifying, because it works. By skimming the current publication of Nature Microbiome, I found one paper that used rarifying and one that did not. The choice is up to you. 

What does your lab do? 

Colin Brislawn 


Roger Huerlimann

unread,
Mar 22, 2016, 7:21:46 PM3/22/16
to Qiime 1 Forum
Hi Colin,

Thanks for yet another thorough explanation. I like following the qiime forum and expanding my understanding.

Our lab is still new to metagenomics, and it's more of a side research we are doing. Our collaborator recommended to use CSS, and that's what we will be using for most of our publications.

It's on my to-do list to revisit my metagenomcis pipeline soon, and to get a better understanding on the different normalisation techniques.
As you mentioned, rarefaction is quite easy to understand compared to the other methods I know of. Do you know of a good summary comparing rarefaction, CSS and DESeq?

Roger

Colin Brislawn

unread,
Mar 22, 2016, 7:54:21 PM3/22/16
to Qiime 1 Forum
Definitely the McMurdy paper is a great place to start. Also, Joey McMurdy's makes the best methods and supplementary sections.  


This series of papers is cool:
A lab proposes a new normalizatiom method called CSS: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4010126/ 
Another lab reproduces the paper and calls BS, arguing that the authors used the method inconsistantly (cheated) to make CSS look better: http://www.nature.com/nmeth/journal/v11/n4/full/nmeth.2897.html 
The first lab strikes back, experimentally justifying their methods and use of pseudocounts while implicitly acknowledging their over-optimization. 

I like these papers because they clearly describe their methods and reasons for criticism. They do not, however, present a method which is conclusively better. 

Let me know what you think. Perhaps your collaborarso have a compelling reason for using CSS. 
Colin Brislawn 

Sophie

unread,
Mar 24, 2016, 10:10:59 AM3/24/16
to Qiime 1 Forum
As Colin does a good job of explaining, there are strengths and weaknesses of each normalization technique. 

An important thing to do is make sure your results are robust to normalization technique.  So, verify your results using two normalization methods.

If you are using presence/absence distance metrics, e.g. binary Jaccard or unweighted UniFrac, rarefying often does best.

Thanks,
Sophie

Sachia

unread,
Feb 27, 2017, 4:27:59 AM2/27/17
to Qiime 1 Forum
Hi guys,

I hope it is okay i revive this thread, else let me know and I will start a new question.

I can say that I/we are moving into using GLMs (generalised linear models) on our seq data when we want test our questions, as an alternative to the distance-based methods. the GLMs are also not perfect, but they handle some other challenges that also are common for NGS type data - I am thinking of skewed variance-mean relationships or heteroscedasticity. If you are interested in reading more I can recommend (in addition to all the great suggestions in this thread) the Warton et al. 2012 "Distance-based multivariate analyses confound location and dispersion effects" and the hands-on paper explaining how use the mvabund package in R by Wang et al 2012 "mvabund - an R package for model-based analysis of multivariate abundance data".

However, I am still keen on discussing normalization of OTU tables, as I often find that an MDS plot is a great visual aid to launch into your paper/story and so I would like to continuously make this. My challenges is that I often get datasets containing samples that obviously have not been sampled deep enough (e.g. sample 1143 reads vs sample 58299 reads), but I still want to do a MDS plot using one of the metrics including abundances e.g. Bray-Curtis (so not the presence/absence types like Jaccard).

Now, there´s an issue with Bray-Curtis based MDS and the output from DESeq/2 as they will produce negative values for low reads. Does anyone have any alternative recommendations, and what about proportions (OTU abundances normalised to sample size)? McMurdie and Holmes point out that one issue with proportions is that it fails to account for heteroscedasticity, but if I only am after doing an MDS plot, does it matter?


Sincerely,

Sachia





jonsan

unread,
Feb 27, 2017, 11:16:58 AM2/27/17
to Qiime 1 Forum
Hi Sachia,

If the purpose of doing the MDS plot is primarily as a visual aid, I'd follow Colin's recommendations above that rarefaction is a useful (if imperfect) approach. Simply using non-normalized but proportional tables still gives an issue with lots of nonzero counts for rare taxa in samples that are more deeply sequenced, and in my hands tends to lead to the visualization of primarily technical variance. 

Cheers,
-jon

Jay T

unread,
Feb 27, 2017, 11:39:21 AM2/27/17
to Qiime 1 Forum
The CSS/Deseq method made my samples in the taxonomy plots look exactly the same. I would be extremely cautious about using it. Rarefying seems to work better.
Message has been deleted

Colin Brislawn

unread,
Feb 27, 2017, 12:43:50 PM2/27/17
to Qiime 1 Forum
Good morning folks,

One year later, it's kind of cool to revisit this thread. While I was moving away from rarefying at the time (early 2016), I find that I still use it today (early 2017).

In retrospect, one takeaway message for me is 'one size does NOT fit all.' Just like different stat tests have different expectations and different visualizations emphasize different things, different normalization methods can complement each other.

For example, you could make a bar graph using all reads (scaled by percent), or perform DESeq2 testing with raw reads. Next, you could make a MDS plot of rarefied samples (because if you don't rarefy, the samples cluster by depth). Essentially, you can match your normalization with your analysis. 

CSS/Deseq method made my samples in the taxonomy plots look exactly the same
That's interesting, Jay. I find that barplots are pretty blunt and non-sensitive to minor stat changes. Maybe differences would show up more in an ordination...

Thanks for the great commentary, folks!

Colin

Sachia

unread,
Feb 27, 2017, 1:39:44 PM2/27/17
to qiime...@googlegroups.com
Hi guys,

thanks for your input. I get your arguments but I have to say I am equally worried about rarefying with a dataset containing such a huge span in library sizes. Like Colin, I have  been on track for some time now turning away from rarefying, but have found that I am still using it in connection with MDS plots. However, one (to me) crucial feature of those earlier datasets have been that they were more evenly sized/smallest sample have relatively decent library size.
Now I find myself with a dataset for which the community has been extracted from a host organism and the PCR blocker did not work 100% so a significant fraction of total reads are the host DNA and we remove that from the OTU table. that leaves me with some samples as low as 10^2 reads. *EDIT: biologically it made sense to us that some samples would be almost "empty" after removing the host, because some organisms had empty guts so we do not want to remove them as we would have had it been a small sized sample due to the sequence run* (Like you said, there´s always a new challenge with a new dataset). I have also discussed this with some colleagues today, and several are using DESeq2 and PCA instead of rarefying + MDS.
Not sure if I´m just seeing issues wherever I go these days, or? .

Jay, I haven´t tried that but it sounds very similar to what a colleague has experienced - he actually just went back to rarefying that dataset.

Cheers,

Sachia

jonsan

unread,
Feb 27, 2017, 2:33:04 PM2/27/17
to Qiime 1 Forum
This is a really interesting problem, and it speaks to my own experience -- input (target) biomass has a huge effect, and it should be explicitly considered in and of itself. If you think that the proportion of host reads is a meaningful number biologically, an argument could be made to not normalize (or to normalize prior to host read removal), because the overall number of remaining non-host reads is itself directly relevant. This would be analogous to using Bray-Curtis in an ecological survey where the absolute number of counts of different species was itself meaningful. However, that's a big if, and there's probably a ton that can affect that host proportion that's essentially noise, and will impact your output due to the compositionality of the overall dataset. 

But that in itself is a different question to how the communities of non-host reads changes. It still may be appropriate to have some minimum cutoff below which you don't have enough confidence in your sampling, but in that case the additional power afforded by the statistical normalizations might be useful. 

-j

Sachia

unread,
Feb 27, 2017, 3:07:32 PM2/27/17
to Qiime 1 Forum
Hi Jon,

thanks again for replying.

I´m not too obsessed on doing proportions :) just thought it might be an alternative where you did not have throw away a lot of data/information from the large samples. You are so right, we cannot put too much emphasis on the biological what if´s that may contribute to variable sample sizes. We are indeed cutting away small samples prior to host removal, but we have also chosen to keep the samples that passed this threshold but which become small after host removal. However, the resultant reduction in some sample sizes makes me reluctant to rarify after that, and that was why I wanted to discuss normalisations specifically for use with Bray_Curtis and MDS.

Cheers,

Sachia

Reply all
Reply to author
Forward
0 new messages