CSS Normalization Before Weighted UniFrac

zabe...@ucdavis.edu

unread,

Aug 31, 2015, 2:45:24 PM8/31/15

to Qiime Forum

Hi all,

I used the normalize_table.py script to normalize my OTU table with cumulative sum scaling (CSS). I then generated principal coordinates and produced a weighted UniFrac 2D PCoA plot from the normalized OTU table. I noticed this PCoA plot looks quite different from the PCoA plot generated when I did not normalize the table; general community trends such as separation by time are present in both PCoA plots, but the relative orientations of each point on the plot are very different.

Is it admissible to normalize your OTU table before performing UniFrac analyses? Or is it best to use the standard OTU table, with raw sequence counts, and just rarefy the samples (like passing the -e argument in beta_diversity_through_plots.py)?

Thanks for any advice!

Colin Brislawn

unread,

Aug 31, 2015, 4:55:23 PM8/31/15

to Qiime Forum

Good afternoon,

Thank you for describing you approach. I think it sounds defensable.

Is it admissible to normalize your OTU table before performing UniFrac analyses?

Yes! In fact, normalizing should be used to remove library size as a confounding factor when performing a PCA.

A better question may be, what is the best normalization method to be used before UniFrac or ordination? It's not clear whether normalizing with CSS, DESeq2, rarefaction, or raw (no normalization) is best. (Note: you should not normalize twice. For example, running CSS on a rarefied table is bad.)

From the documentation of normalize_table.py

For any of these methods, clustering by sequence depth MUST BE CHECKED FOR as a confounding variable, e.g. by coloring by sequences/sample on a PCoA plot and testing for correlations between taxa abundances and sequencing depth with e.g. adonis in compare_categories.py, or observation_metadata_correlation.py.

Colin

Antonio González Peña

unread,

Aug 31, 2015, 6:09:43 PM8/31/15

to Qiime Forum

Suggest taking a look at: https://peerj.com/preprints/1157/

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Qiime Forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to qiime-forum...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Antonio

zabe...@ucdavis.edu

unread,

Aug 31, 2015, 6:24:18 PM8/31/15

to Qiime Forum

Thank you! Both responses were very helpful

Andrew Krohn

unread,

Sep 3, 2015, 7:41:17 PM9/3/15

to Qiime Forum

I agree with Colin that this is a very useful approach, and it is my standard approach for now. The Paulson paper was quite convincing in the effect of CSS normalization to parse differences in sample groups, and McMurdie and Holmes spelled out in the simplest of terms (see Fig 1 from that paper) how rarefying can hurt your analysis.

In practice, I don't see much difference between the two, but I like knowing that I am using normalization where it counts to control for the effects of overdispersion. That said, many people ask me what else to use the normalized table for, and the answer is that you only use it for your test of total beta diversity. Group significance, alpha diversity comparisons etc should be done on rarefied data instead.

Andrew Krohn

unread,

Sep 3, 2015, 8:08:19 PM9/3/15

to Qiime Forum

I am reading through Sophie's paper now (very nice), and one of the conclusions I think is very important to follow:

"For the parametric differential abundance techniques, it is recommended that rare OTUs be filtered out of the matrix prior to differential abundance testing....we advise OTU filtering after rarefying, and then applying non-parametric tests."

One caveat is that the Caporaso 2012 data was Illumina single-indexed on a separate read. Such data is surprisingly susceptible to sample cross-talk during sequencing (<0.3% Kircher et al 2012), so I wonder if this was the best data set to use to investigate the effects of "rare OTUs" on the data when you can't really be certain to which sample such sequences belong. Since the rare OTUs seem to be driving a lot of the differences observed due to sequencing depth, the indexing scheme may be partially responsible for creating the observed effect. Still, it sounds like as long as you have adequate depth (>1000 per sample), you should be OK and this is in line with the conclusions last year of Smith and Peay that depth counts more than sample PCR replication.

Colin Brislawn

unread,

Sep 3, 2015, 8:09:03 PM9/3/15

to Qiime Forum

Hello Andrew,

You mention

The Paulson paper was quite convincing in the effect of CSS normalization

It's not perfect, and CSS normalization was immediately criticized upon publication.

http://www.nature.com/nmeth/journal/v11/n4/full/nmeth.2897.html

To which Paulson replied:

http://www.nature.com/nmeth/journal/v11/n4/full/nmeth.2898.html

This discussion, which hinges on the treatment of zeros in the data set, underscores the weakness of all these models. What should you do when a zero in your abundance table could be 'not present' or 'missed observation?'

You also mention:

Group significance, alpha diversity comparisons etc should be done on rarefied data instead.

This is the exact opposite conclusion that Joseph McMurdy (phyloseq dev) draws. From his tutorial:

it is probably not a bad idea to prune OTUs that are not present in any of the samples (for some reason there are a few in "GlobalPatterns") – BUT DON'T TRIM MORE THAN THAT!

http://joey711.github.io/phyloseq/plot_richness-examples

I view rarifying (subsampling) as a conceptually simple but imperfect normalization method. I'm interesting in finding a better one.

Colin

Andrew Krohn

unread,

Sep 3, 2015, 8:32:48 PM9/3/15

to qiime...@googlegroups.com

Unfortunately my institution doesn't allow me to access Nature Methods and I probably won't ILL the letters, but it is interesting knowing about this exchange.

Personally, I think that in amplicon sequencing, especially when sampling from similar habitats the assumption must be that a zero simply represents a missed observation.

That said, others I have talked to throw out a lot of low-abundance stuff due to the mixed clustering issue unless using confirmatory indexes (unique indexes for both indexing reads) as opposed to simple combinatorial indexes or as my dissertation data were generated (gulp), single indexes. When low-abundance OTUs are simply discarded, less of the community can be addressed, and yet the conclusions may be more confident as that zero count "missed observation" can be pushed toward the "not present" camp as far as confident observations are concerned (certainty that the sequence belongs to a given sample).

Subsampling should give you a better estimate of individual OTU proportions when doing group comparisons as opposed to total beta diversity. The overdispersion corrected for by normalizing has a lot to do with simply detecting more OTUs due to greater depth, but normalizing is also skewing the observed abundances of your data, so won't yield valid proportions for abundance comparison. Amend et al (2010?) showed these "quasi-quantitative" (is that how they put it?) differences are important to observing community differences, even if they aren't necessarily the true proportions.

I also think that removing singletons or "low-count-tons" is good practice. With my single indexed data, it is absolutely insufficient, since samples can "share" observations due to indexing inaccuracy on the sequencer. Trim that noise right away. Also kill unshared OTUs since they could easily be PCR artifacts. If you think it is a real sequence, you need to get more DNA (fresh extraction) from the suspect location, and resequence a few true replicates. For those of us with single-indexed data, the general conclusions you will observe are correct, but there will be increasing doubt as to which habitat an OTU was observed in as its observation counts diminish.

One other thing I would add to this discussion is that very stringent quality filtering seems to remove a lot of the noise I otherwise observe (singletons/doubletons etc that said tutorial might otherwise retain). I typically throw out 70-80% of data in favor of those sequences that are only of exceptional quality. Sure this reduces my depth, so am I seeing an effect of count? Maybe, but my final depths are still greater than 1000 most of the time which according to the Weiss manuscript is adequate to avoid most count-based issues.

Agree on subsampling being imperfect, but I wouldn't discount it entirely either as it allows us to get the data to fit certain assumptions without which we can't perform essential statistical tests.

--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/hUh8VyTuqGI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Sophie

unread,

Sep 8, 2015, 1:17:32 PM9/8/15

to Qiime Forum

Agree with both of you on subsampling being imperfect, but it is the best option at the moment for presence/absence metrics, like unweighted UniFrac and binary Jaccard.

Also, to be safe (after checking for sequence depth clustering), you could report results upon which two normalization techniques agree.

Jay T

unread,

Apr 22, 2016, 12:44:41 AM4/22/16

to Qiime 1 Forum

I'm curious if anyone normalizes or rarefies their OTU table when looking at taxonomic summaries?I have noticed that the size of the library can potentially impact the distribution of taxa represented in my barplots even when running the subsampled open reference OTU picking strategy. Hence, samples with a significantly higher number of sequences may over-represent certain taxa.

Typically my workflow consists of the following scripts:

Pick_open_reference_otus.py - sub-sampled open reference OTU picking

core_diversity_analyses - using a sequencing depth anywhere between 1800 - 15000 (I've read above 1000 is good practice)

The core diversity script is supposed to remove singletons and rarefy the OTU table.

After I look at the results for alpha diversity from the rarefied OTU table, and export the table as a .txt file into Primer.

After looking over this current thread and playing around with the normalize_table.py function I am curious if the core_diversity_analyses.py is still admissable. Also going back to the original question, does it make sense to use the rarefied tables or normalized tables to produce taxonomic summaries based on the relative abundance or should I just use the raw data even though there may be large differences in library size (or should I just have my amplicons re-sequenced)?

TonyWalters

unread,

Apr 22, 2016, 8:04:36 AM4/22/16

to Qiime 1 Forum

Jay, the rarefied table isn't used for the taxonomic plots in the core_diversity_analysis.py, the original input OTU table is used. When I generate taxonomic plots, I filter out the low sequence count samples (<1000 is a reasonable threshold), as these are often the ones that look the most aberrant, but I generally do not used rarefied or normalized tables.

Jay T

unread,

Apr 22, 2016, 11:43:18 AM4/22/16

to qiime...@googlegroups.com

Tony,

Correct, I understand that the core_diversity_analysis.py script uses the original OTU table, but part of the workflow for alpha and beta diversity analyses includes rarefying the original OTU table, correct? What I am asking however is do you think differences in sequence library size affect the observed taxonomic distribution as observed in the bar plots? I'll post two bar plots that I generated using PHINCH. They will show the relative abundances generated from a raw OTU table where the range in sequences was anywhere between 2,000 - 100,000 sequences per sample, and a normalized OTU (using normalize_table.py) where the number of sequences were normalized to around 1,500-3,500 sequences. I'm not sure whether to sequence a bit more for the samples that are low or if the distribution of taxa in my samples actually reflects the effect of my 3 treatments. The rarefaction curves seem to level off around 500 otus at around 1500 sequences so I am wondering if it even matters that one sample has 10X more sequences than the samples with comparatively lower numbers. In summary, which of the above plots is more accurate and why?

Thanks,

JT

relative.png

abundance.png

relative_normalized.png

abundance_normalized.png

Andrew Krohn

unread,

Apr 22, 2016, 7:02:19 PM4/22/16

to Qiime 1 Forum

My feeling is that if you use a normalized table for your analysis, you ought to be reporting the normalized results. Normalizing does some really cool stuff to the count data, but it also does things you might not like, so use with caution and keep an eye on how it alters the result.

In mock communities, I find normalizing really makes the results consistent in terms of expected relative abundances, and this is seen in the plots you provided, and you will also notice it does a fine job of controlling constituent variances. However, if there are contaminants in the mock communities, those very small raw counts become dramatically inflated via normalization. This implies that low-abundance taxa may become over-represented in a normalized data set, and the obvious control is the amount of filtering you apply to your OTU table between tax assignment and diversity analysis. I find the 0.005% across the table threshold by Bokulich et al 2013 works very well most of the time. I am an advocate for normalizing your input table and using the normalized result to generate plots and stats for beta diversity, taxa summaries, and group comparisons. Alpha diversity should still be assessed by rarefying the raw data.

As to Tony's point, he uses the input OTU table for summary plots, but I assume he is relativizing at this point which makes the result approximately equivalent to a relativized table that was first rarefied. I agree to dump low-count samples and 1000 sounds very reasonable.

Your range of read depths would cause some concern as this gets to the overdispersion problem described very well in the waste not want not paper (McMurdie and Holmes, 2014). It may or may not cause any real problem, but use of raw counts with such a spread of values could lead to clustering according to sequencing depth. I don't know how admissible this is, but I've been playing around with outlier tests to remove samples with extreme (high or low) sequencing depth in order to avoid such complications. I really like Rosner's ESD test for this (http://contchart.com/outliers.aspx), but you could iteratively use Grubbs test and get about the same result. The biggest problem for you would be that the sample(s) with 100k reads will likely detect things not seen in most other samples, and the relative abundances of these rare taxa can then be inflated through the norm step. Even worse, if that highly sequenced sample carries more PCR and/or sequencing errors with it, then they could also be inflated in quantity and you would be reporting on false diversity.

Reply all

Reply to author

Forward