summarize_taxa.py: what is the "Unassigned" OTU in the output text file

65 views
Skip to first unread message

Capricy Gao

unread,
Sep 2, 2016, 5:21:51 PM9/2/16
to Qiime 1 Forum
Hello, there,

Could anyone help me to better understand the output of this command line? In the column "#OTU ID", the first one is this "Unassigned".

Since my input biom file contains a lot of "New.ReferenceOTU"s, I tried to calculate the their percentage, which seems to be much higher than those "Unassigned". I also wonder if those "New.ReferenceOTU"s are really included in the summary.

Thanks a lot!

C.

Jose Antonio Navas Molina

unread,
Sep 6, 2016, 10:32:55 AM9/6/16
to Qiime 1 Forum
Hi C.

I'm not sure to which command you're referring to, can you elaborate? By summary, I'm assuming that you're talking about the biom summarize-table command. If that's the case, then all the contents of the BIOM table are included.
"Unassigned" OTUs are OTUs that couldn't be assigned to any taxonomy in your reference DB. The New.ReferenceOTUs are OTUs that have been generated in the step 2 of the open reference OTU picking pipeline.

Cheers,

Capricy Gao

unread,
Sep 6, 2016, 3:50:13 PM9/6/16
to Qiime 1 Forum
Hello,

Thank you for the response. I was referring to the exact command: summarize_taxa.py from qiime. But I kind of figure out what happened: the New.ReferenceOTU sequences were subject to taxonomy profiling using uclust with 90% as the criteria and the remaining will be labelled as "Unassigned". Correct me if I am wrong, please.

Thanks!

C.

Jose Antonio Navas Molina

unread,
Sep 7, 2016, 10:36:21 AM9/7/16
to Qiime 1 Forum
Hi C,

According to the name of your OTUs, it looks like you've run the open reference workflow. In the open reference workflow, once all the OTUs have been found, the next step is to assign taxonomy, hence all the OTUs in your table have gone through the process of assigning taxonomy (according to your email, using the uclust taxonomy assigner). Hence, the name of the OTU doesn't make a difference on how taxonomy has been assigned to your sequence.

Next, the "Unassigned" assignment for OTUs means that the taxonomy assignment could not find any match on the reference database. If you're interested on seeing what these are, I recommend to grab the representative sequence of those OTUs and blast them. Then you can decide what to do with those OTUs.

Cheers,

Capricy Gao

unread,
Sep 15, 2016, 11:33:02 AM9/15/16
to Qiime 1 Forum
Thanks for all the responses. I have a further question about taxonomy assignment: when 'uclust' method is applied, does it mean that the input sequences are clustered first and then the cluster centroids are usearch-ed against the database, or the each of the input sequences is usearch-ed against the database individually?

Thanks,

C.


Jose Antonio Navas Molina

unread,
Sep 16, 2016, 11:57:54 AM9/16/16
to Qiime 1 Forum
Hi C,

The later, each input sequence is usearch-ed against the database individually. Here is the pre-print where the taxonomy assigner is being explained: https://peerj.com/preprints/934/

Cheers,

Capricy Gao

unread,
Sep 20, 2016, 9:01:44 PM9/20/16
to Qiime 1 Forum
Thank you very much for the answer. Also thank you for recommending the paper. It is a bit surprise that the paper suggested using lower similarity threshold (0.8 instread of 0.9) for uclust during taxonomic classification...

Kruttika Phalnikar

unread,
Nov 15, 2016, 4:05:05 AM11/15/16
to Qiime 1 Forum
Hi Jose,

Like you have mentioned in this reply: "If you're interested on seeing what these unassigned reads are, I recommend to grab the representative sequence of those OTUs and blast them"
I wish to do exactly that, could you please elaborate on how to go about it?
Thanks in advance!!
Reply all
Reply to author
Forward
0 new messages