[picrust-users] Rarefy OTU table?

Johannes Espolin Roksund Hov

unread,

Jun 6, 2013, 8:27:16 AM6/6/13

to picrus...@googlegroups.com

Hi,
One simple question: Is it necessary/advicable to rarefy the OTU table (to e.g. the smallest number of OTUs among the samples) before predicting the metagenomes? (Our aim is to use otusignificance.py to find associations of KEGG-Pathways in groups of samples. )

Or should rarefaction perhaps be done on the predicted_metagenome.biom (to the smallest number of genes).

A final option would be to rarefy the collapsed file after categorize_by_function.py.

I realize there is a "normalize_by_copynumber.py" procedure, but the way I interpret the text this "corrects" the number of genomes, since some OTUs have several copies of the 16S rRNA gene?

Thank you in advance for input on these issues!

Best wishes,
Johannes

Johannes Roksund Hov, MD PhD
Post.doc., Norwegian PSC Research Center / Resident, Dept. of gastroenterology
Division for cancer medicine, surgery and transplantation, Oslo University Hospital Rikshospitalet
Postal address: Pb 4950 Nydalen, N-0424 Oslo, Norway
Phone: +47 23 07 00 00 / +47 916 87 143
E-mail: j.e....@medisin.uio.no
http://www.ous-research.no/nopsc/

Jesse Zaneveld

unread,

Jun 6, 2013, 2:25:54 PM6/6/13

to picrus...@googlegroups.com

Hi Johannes,

Thanks for your interest in PICRUSt.

One simple question: Is it necessary/advicable to rarefy the OTU table (to e.g. the smallest number of OTUs among the samples) before predicting the metagenomes? (Our aim is to use otusignificance.py to find associations of KEGG-Pathways in groups of samples. )

Although it isn't strictly necessary to rarify the OTU table in order to predict metagenomes, I agree that it is a good idea to do so if you want to apply QIIME's OTU category significance as part of your downstream analysis. I can think of arguments in favor of rarefying at the level of OTUs, predicted KOs, or KEGG pathways, but personally I would favor rarefying the original OTU table for a couple of reasons. First, this is the raw input data and so seems to most naturally reflect sampling effort. Second, we've actually done some control analyses testing how much of an effect lower sampling depth (or more severe rarefaction of the OTU table) has on PICRUSt predictions using paired 16S/metagenome datasets from a collection of soil samples. (We haven't yet done similar tests for KO or pathway count rarefaction). For OTU table rarefactions, we found that until you reached very shallow depths (hundreds of OTU counts), the PICRUSt predictions were not much degraded. This is likely because many communities have steep rank abundance curves. However, shallower depth might of course miss a rare OTU carrying a unique function of interest.

I realize there is a "normalize_by_copynumber.py" procedure, but the way I interpret the text this "corrects" the number of genomes, since some OTUs have several copies of the 16S rRNA gene?

That's right. normalize_by_copy_number.py attempts to account for variable 16S rRNA copy numbers between organisms by dividing the observed 16S relative abundance for each OTU by the predicted 16S copy number/organism in that OTU to get a predicted organismal relative abundance. So you are exactly right in thinking that script would not account for sampling effort.

Let me know if you have any more questions and I'd be happy to help.

All the best,
Jesse

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Johannes Espolin Roksund Hov

unread,

Jun 10, 2013, 4:15:54 AM6/10/13

to picrus...@googlegroups.com

Hi Jesse,

Thank you for this great advice. I have one follow-up question, then: What about rarefaction before generating beta diversity plots, e.g. on the pathway level?

Best wishes,

Johannes

2013/6/6 Jesse Zaneveld <zane...@gmail.com>

Jesse Zaneveld

unread,

Jun 10, 2013, 7:18:21 PM6/10/13

to picrus...@googlegroups.com

Hi Johannes,

My instinct is that rarefaction either at the input OTU level, or at the Pathway level will likely be reasonably equivalent in terms of overall beta-diversity trends. However, a given number of pathway counts will not reflect the same rarefaction depth as the same number of OTU counts. The same applies to KO vs. pathway counts (since each KO can be in zero, one, or several pathways). Whichever way you go, you may want to pick a reasonable rarefaction depth based on the rage of values for that annotation type. I can't say much more than that with confidence, since I haven't done any additional comparisons contrasting rarefaction at OTU vs. KO vs. pathway vs. category levels.

My recommendation would be that either approach you suggest would probably work, but if you would like to be sure, the safest thing to do would be to test each approach and compare the results (since these steps are automated in QIIME it probably won't take too long). It is always reassuring when important biological results are independent of small deviations in methodology (and a bit scary when they are not).

All the best,
Jesse

Kristian Holm

unread,

Jun 12, 2013, 4:44:52 AM6/12/13

to picrus...@googlegroups.com

Hi Jesse,

Sorry, but i'm still a little confused.

Regarding qiime's otu category significance, i thought the statistical tests in that script was based on relative abundances and therefor did not need rarefying? Categorize_by_function.py produces files with counts, but after processing them with otu_category_significance.py the result file contains percentages.

So far i have only rarefied the metagenome_predictions.biom on the fly when producing betadiversity plots. (rarefied to the lowest number of genes).

Here is an overview of what i have been doing until now:

fasta -> pick_closed_reference_otus.py -> normalize_by_copy_number.py -> predict_metagenome.py -> betadiversity_through_plots.py (using rarefaction)

fasta -> pick_closed_reference_otus.py -> normalize_by_copy_number.py -> predict_metagenome.py -> categorize_by_function.py -> otu_category_significance.py

fasta -> pick_closed_reference_otus.py -> normalize_by_copy_number.py -> predict_metagenome.py -> categorize_by_function.py -> summarize_taxa_through_plots.py

Does this look ok?

If rarefaction is in fact needed I suppose it should done on either the initial otu_table OR a later stage (pathway level), not both? so that if i rarefy the initial otu_table, i should not do another rarefaction on the pathway level. (ie. when running betadiversity_through_plots.py)?