How to Deal with Unclassified/unassigned taxa in Qiime pipeline?

973 views
Skip to first unread message

shriram patel

unread,
Feb 15, 2017, 1:30:28 PM2/15/17
to Qiime 1 Forum
Hello All, 

My question is related to QIIME and Mgrast pipeline.

I have been using Qiime from a long time for amplicon based analysis. However the major problem I am facing when picking OTUs with denovo approach is large number of OTUs that can not be classified to genus level. In that majority of most abundant genus remained unclassified at Family level. But when I have uploaded same data to mgrast server, I got most of the taxonomic assignment to genus level. (For example in Qiime most abundant genus was unclassified "Ruminococaceae", while in mgrast result it was "Ruminococcus"). I did all the step in Qiime from demultiplexing to biom file generation. I am using Greengenes database 13_8 for taxonomic assignment using uclust. Can anyone help me in sorting out this problem. All the suggestions are welcome. Feel free to ask any detail related to this question. i would love to share it..

Thank You

With Best

Antonio González Peña

unread,
Feb 15, 2017, 2:15:52 PM2/15/17
to Qiime 1 Forum
If you want to avoid doing OTU clustering you can use qiime2/dada, qiime2/deblur or qiita/deblur; any of these should help you with your resolutions. Now, taxonomy assignments are difficult and the more specific you get, the more difficult/not-accurate they are. One of my favorite plots to show how difficult/accurate this problem is: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950982/figure/f1/, which basically tells you why we "trust" up to genus level assignment.

Now, if you want to read more about MG-RAST taxonomy problems take a look at: http://msystems.asm.org/content/1/3/e00050-16

Hope this helps. 

shriram patel

unread,
Feb 16, 2017, 12:03:50 PM2/16/17
to Qiime 1 Forum
Hi,

Thank you very much for your detail explanation. It was very helpful, specially the commentary paper. 
I will give a try to Qiime2/dada abd Qiime2/deblur workflow.

Moreover, in some of the paper I have seen that 10-20% of bacterial sequences remained unclassified at phylum level. In addition, they did screening for non 16S sequences by mapping (80% sequence similarity) raw reads to reference database. Can these be a non bacterial sequences or they are novel phyla?
 

Thank You very much, 

Best

Antonio González Peña

unread,
Feb 16, 2017, 5:52:24 PM2/16/17
to Qiime 1 Forum
Excellent question and no idea. It will depend how well curated the reference sequence db is and how much you trust it. Note that another option, and arguably better, is to get the sequences and insert them to a well known tree; if the sequence create a new full branch, then is not "real" ... 

shriram patel

unread,
Feb 18, 2017, 1:58:14 AM2/18/17
to Qiime 1 Forum
Hello, 

I agree that well curated reference database is required to avoid false taxa assignment.. Suppose We choose to use Greengens 13_8 reference db for taxonomic assignment. Is there any additional step (before taxa assignment) we can performed on reference db that can improve (increase accuracy of) taxonomic assignment? Your suggestions in this case will help many folks, especially me, who deals with 16S data analysis.

Best

Antonio González Peña

unread,
Feb 20, 2017, 12:56:07 PM2/20/17
to Qiime 1 Forum
Well, I think it's pretty hard to have a silver bullet for any sample type (soil, water, fecal, oral, etc) so the suggestion is to use your most trusted reference DB. Suggest also carefully reading the methods sections of your favorite papers that deal with your same sample type. Note that we use GG13_8 as our default.

Jay T

unread,
Mar 7, 2017, 12:48:42 AM3/7/17
to Qiime 1 Forum
Antonio - How do you take the deblurred sequence OTU table and convert the observation ids to green genes taxonomy?'

Thanks,
JT
Reply all
Reply to author
Forward
0 new messages