How to choose minimum number of reads per OTU??

912 views

Skip to first unread message

Kruttika Phalnikar

unread,

May 9, 2017, 9:12:58 AM5/9/17

to Qiime 1 Forum, Rittik Deb

Hi all,

Thank you for reading this post!

>I am analyzing 16s data MiSeq for more than 100 samples
>When I look at the OTU table that gives reads/OTU, I see a huge number of OTUs across all samples (>70,000)
>Now some of these OTUs might have arisen due to spurious nature of the MiSeq chemistry OR for whatever reason the confidence in rare OTUs is low. I want to remove OTUs based on a cutoff and I am confused regarding which method to prefer. I have thought of following options

1) Remove OTUs that have less than X number of reads. I don't know how to decide on that number since I couldn't find any clear reference stating whats the standard
2) Remove OTUs that contribute "X" value of relative abundance . e.g. remove all those OTUs that have <0.01 abundance

Now for both these options, should I apply cutoffs on per sample basis or across entire sample set?

I came across a reference that says apply cut off of 0.005% across entire OTU set for all samples in comparison (if I interpret it correctly). This is that paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531572/pdf/nihms420659.pdf

Can someone please tell me if there is a standard way to filter OTUs from OTU table? Minimum OTU size should be 2 ? 10? 20? etc OR relative abundance of 0.1? 0.01? 0.005? etc

Thank you,
Kruttika

Greg Caporaso

unread,

May 9, 2017, 6:12:03 PM5/9/17

to Qiime 1 Forum, debr...@gmail.com

Hello,

There unfortunately is not a standard for this. The 0.005% cut-off is a heuristic that is sometimes used and has shown to be helpful in the paper that you're citing. Another common filtering approach would be to filter OTUs that are present in only a single sample. You can apply both of these filters with filter_otus_from_otu_table.py. I most commonly filter OTUs that are only observed in a single sample by calling filter_otus_from_otu_table.py -s 2 ..., but there are no studies that I'm aware of that compare these approaches. If you're concerned about the impact that choosing one of these filters might have on your results, a good option would be to generate results with a few of the filters and see if your final conclusions are robust to the OTU filtering process that you applied. If so (which is likely, based on my experience) then you should be safe to report the results from any of the filters that you applied.

Another alternative that you may be interested in is to use QIIME 2 instead of QIIME 1. QIIME 2 uses improved quality filtering methods - you can choose between DADA2 and Deblur, as of this writing - so this filtering step is no longer important.

Hope this helps!

Greg

Reply all

Reply to author

Forward

0 new messages