Normalization for NGS count data with high variance between observations / uneven communities

47 views
Skip to first unread message

tric...@uni-bremen.de

unread,
Aug 23, 2017, 9:02:23 AM8/23/17
to Qiime 1 Forum
Dear group,

we have a MiSeq 16S-dataset featuring samples from enrichment studies, i.e. communities from a time series in which some OTUs become dominant over time, e.g. up to 90% of all reads. The biological question would to find a) which OTUs respond to different enrichment strategies and b) when they start to enrich. I guess, this qualifies as a expression analysis.

Thus, we need to normalize the data due to highly variable sequence depths (20,000 to 70, 000 reads) and to validate our post-hoc analysis.

I tried percentile-based normalization like CSS but i have just learned the hard way, that they are not suited for this dataset (as they typically want to see relatively invariate data). CSS, e.g., just took away all observations from the enriched OTUs until the enrichment effect was not visible anymore.

Rarefying is inadmissable as McMurdie & Holmes told us.

Total-Sum-Scaling (i.e. scaling to all reads in a sample) is dangerous because it is sensitive to compositional effects (as our samples tend to become very uneven over time).

Any ideas how to best treat the data would be greatly appreciated.




Colin Brislawn

unread,
Aug 23, 2017, 12:41:00 PM8/23/17
to Qiime 1 Forum
Good morning,

This is tough challenge because many of the common methods do not work well, just like you described. Here are some software packages which are designed to handle data like this:

ANCOM: 

Balance trees / Gneiss:

Let me know if that's helpful.

Colin

tric...@uni-bremen.de

unread,
Aug 24, 2017, 6:53:47 AM8/24/17
to Qiime 1 Forum
Thank you Colin,

I already tried ANCOM and found it easy to use and interpret. Ill take a look in the balance trees, as well.
There is also Olesen et al, 2016 PLoS 1, based on Poisson models of OTU distribution (but i dont think they say anything about otu table normalization).

What i would like to see, however, is the implementation of a downstream step, which produces an universally usable OTU table, based on a consential, widely applied methods (much like rarefying before it was doomed "inadmissable"), which can be used in a variety of analyses (alpha, beta, expression differentials, etc).


Right now, this "master input" file seems to be the raw, untransformed, unnormalized, non-rarefied OTU table, as obtained in a QIIME biom, which then needs to be subjected to different transformation/normalization methods depending on the research question.
This is a bit unsatisfying, especially when one seeks to properly communicate to co-researchers.

Colin Brislawn

unread,
Aug 24, 2017, 2:10:34 PM8/24/17
to Qiime 1 Forum
Good afternoon,

That Olesen et al, 2016 paper in PLoS 1 looks very cool! Thanks for sharing that. 

What i would like to see, however, is the implementation of a downstream step, which produces an universally usable OTU table
Me too! But I'm not sure one exists...

One of the main results of Paul's "Never rarify. Ever." paper was that more microbiologists starting thinking about some of the statistical challenges of the field, and paying more attention to the requirements of their statistical methods. I think this increased statistical literacy is helpful, but you do have to clearly community the statistical limitations to co-researchers, just like you said.

While Paul McMurdie does not does not rarify, other researches make a different argument. See this paper, which replicates and extends parts of the Paul's paper.
 

which then needs to be subjected to different transformation/normalization methods depending on the research question.
I do this too. I think that's the current state of the field.  

I hope others can 'qiime in' to this conversation. How do you normalize these days?

Colin

leah reshef

unread,
Aug 26, 2017, 10:55:59 AM8/26/17
to Qiime 1 Forum
I still rarify. Maybe its my thick non-mathemical-oriented head, but I still think its the lesser of evils.
Reply all
Reply to author
Forward
0 new messages