Different OTUs mapping to same taxa - effects of filter_otus_from_otu_table.py?

35 views

Skip to first unread message

Angela

unread,

Aug 17, 2017, 2:39:06 PM8/17/17

to Qiime 1 Forum

Hello,

I am just examining my OTU table after using pick_closed_reference_otus on the latest SILVA database, and I noticed that multiple rows in the OTU table can correspond to the same taxa. For example, there are 30 different OTUs assigned to "D_0__Bacteria; D_1__Firmicutes; D_2__Bacilli; D_3__Lactobacillales; D_4__Lactobacillaceae; D_5__Lactobacillus; Ambiguous_taxa". So I were to use filter_otus_from_otu_table.py on the biom table right now, does that it will remove "rare OTUs" even though they could be classified as a very common genera, causing some genera to have lower abundance counts than actually observed? So should I first sum up all the rows corresponding to the same taxa and then remove rare taxa using a custom script? Thanks.

Jose Antonio Navas Molina

unread,

Aug 18, 2017, 11:24:31 AM8/18/17

to Qiime 1 Forum

Hi Angela,

It is quite normal to have multiple OTUs assigned to the same taxonomy string. Note that if they're different OTUs it means that they actually matched different reference sequences on the reference database, so they're different. It just happens to have the same taxonomy string.

Now, your specific question about filter_otus_from_otu_table.py. It depends a bit on your definition of "rare OTUs". The filtering of low abundance OTUs (i.e. OTUs with a handful sequences) is often performed due to a lack of trust on the validity of those sequences. I know there are different points of view regarding this topic, and some other users may disagree with this view, but having a limited number of sequences on those OTUs you're unsure if that is a true sequence present on your dataset or it is just a result of sequencing/PCR error. It may be useful to perform an analysis at the sequence level, in which case instead of OTUs it would be better to perform your analysis using DADA2 or deblur, which operate at the sequence level.

On the other hand, if you are interested in removing taxa that you know it shouldn't be in your dataset (i.e. they're contaminants) you can use filter_taxa_from_otu_table.py.