Too many "Unassigned" OTUs after running assign_taxonomy.py

812 views
Skip to first unread message

Lorinda

unread,
Feb 20, 2014, 6:18:21 PM2/20/14
to qiime...@googlegroups.com
Hello,

After running the assign_taxonomy.py,  99.8% of my sequences remain "Unassigned."  However, when I blast these sequences manually I get good species matches like the one below:

 I am running the following command on QIIME 1.8, using the latest UNITE ITS database:

assign_taxonomy.py  -i /Users/lorinda/Desktop/Q30/rep_setQ30.fna -t /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.txt -r /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.fasta -o /Users/lorinda/Desktop/Q30/UCLUST.9_97_otus



I attached my original rep_set fasta file as well as the output fasta file and log file and the reference database file.  It seems like I shouldn't be getting near this many unassigned OTU's, but I'm not sure what is going wrong.
rep_setQ30_tax_assignments.txt
rep_setQ30_tax_assignments.log

Lorinda

unread,
Feb 20, 2014, 6:21:45 PM2/20/14
to qiime...@googlegroups.com
rep_setQ30.fna.zip

Lorinda

unread,
Feb 20, 2014, 6:22:43 PM2/20/14
to qiime...@googlegroups.com


On Thursday, February 20, 2014 4:18:21 PM UTC-7, Lorinda wrote:
97_otus.fasta.zip

Tony Walters

unread,
Feb 20, 2014, 6:26:04 PM2/20/14
to qiime...@googlegroups.com
Lorinda, do those blast results cover the entire length of the sequence? Are there overhands at the end that do not match? I'm asking this because if non-target reads remain in your sequences (e.g., reads that go past the reverse primer and into barcode/adapter sequence), it will strongly affect the RDP classifier (and the related uclust classifier), will blast, which will allow for partial overlap between the query and reference database, will not be as strongly affected.

Can you check towards the ends of your reads for the reverse primer sequence as well? If they are there, you can run truncate_reverse_primers.py (http://qiime.org/scripts/truncate_reverse_primer.html) on your reads-you'd want to do this on the post-demultiplexed reads (seqs.fna) and redo the OTU picking process through making an OTU table.


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lorinda

unread,
Feb 21, 2014, 2:05:37 PM2/21/14
to qiime...@googlegroups.com
Hi Tony, 
The BLAST results cover the entire length of the sequence.  Searching my rep_set for the reverse primer returned 1 sequence out of ~12,000 that still contained the reverse primer sequence.  Could it be the new UNITE database?  I see I am not the only person who has encountered this issue.

Tony Walters

unread,
Feb 21, 2014, 2:13:18 PM2/21/14
to qiime...@googlegroups.com
Possibly, can you try an older version of UNITE? Also, there are two versions of UNITE for each of the new releases, did you use:
or 

It would be good to know which one, if any, of those work.

Lorinda

unread,
Feb 21, 2014, 2:25:21 PM2/21/14
to qiime...@googlegroups.com
I originally used the http://unite.ut.ee/sh_files/sh_qiime_release_s_09.02.2014.zip version, but the January 2014 and December 2013 releases return the same results.


On Thursday, February 20, 2014 4:18:21 PM UTC-7, Lorinda wrote:

Tony Walters

unread,
Feb 21, 2014, 2:29:04 PM2/21/14
to qiime...@googlegroups.com
Can you try this one https://github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz

As we don't maintain these databases, we'll have to forward a query to those that do if the 12_11 database works but the newer ones do not. If you want to attach some sequences that are failing to assign, that might be handy for troubleshooting on there end. Also, if you use -m blast as the method, does that yield better hits than RDP/uclust?


--

Lorinda

unread,
Feb 22, 2014, 8:31:55 PM2/22/14
to qiime...@googlegroups.com
This database (https://github.com/downloads/qiime/its-reference-otus/its_12_11_otus.tar.gz) worked much better.  I now have a more reasonable amount of "unassigned" otus. Any idea when we can expect the more recent UNITE databases to be functioning?   


On Thursday, February 20, 2014 4:18:21 PM UTC-7, Lorinda wrote:

Tony Walters

unread,
Feb 22, 2014, 10:05:32 PM2/22/14
to qiime...@googlegroups.com
Lorinda,

I've sent an email to the maintainer of UNITE. Could you supply a few sequences from OTUs that were unclassified with the http://unite.ut.ee/sh_files/sh_qiime_release_s_09.02.2014.zip version but worked fine in the older version (12_10)? I think this might help to determine why there are more unclassified sequences with the newer release. Also, which of the reference databases/taxonomy mapping files did you use with the http://unite.ut.ee/sh_files/sh_qiime_release_s_09.02.2014.zip release?


--

Lorinda

unread,
Feb 25, 2014, 12:04:45 PM2/25/14
to qiime...@googlegroups.com
Hey Tony,

I've used equivalent taxonomy mapping files for each version of the UNITE reference database.    I've attached a file containing 20 sequences that had taxon assignments in the 12_10 database but not the 09.02.14 release using all the same parameters.  Hope this helps.


On Thursday, February 20, 2014 4:18:21 PM UTC-7, Lorinda wrote:
UNASSIGNED_SEQUENCE_SET.xlsx

Tony Walters

unread,
Feb 25, 2014, 12:33:50 PM2/25/14
to qiime...@googlegroups.com
Lorinda, I've forwarded these sequences to the maintainer of UNITE. Could you post the commands you used as well? You can type: history
in the terminal to see your prior commands.


--

Lorinda

unread,
Feb 25, 2014, 1:22:34 PM2/25/14
to qiime...@googlegroups.com

I picked OTU's using the an EC2 instance, but the command failed at "assign_taxonomy" due to an error in the parameters file.  I used the the outputted otu table and rep.set.fna filefrom this command to run the assign taxonomy.py command separately.  

The commands I used for assigning taxonomy all followed the same format as the one below:


assign_taxonomy.py  -i /Users/lorinda/Desktop/Q30/rep_setQ30.fna -t /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.txt -r /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.fasta -o /Users/lorinda/Desktop/Q30/UCLUST.9_97_Feb




On Tuesday, February 25, 2014 10:33:50 AM UTC-7, TonyWalters wrote:
Lorinda, I've forwarded these sequences to the maintainer of UNITE. Could you post the commands you used as well? You can type: history
in the terminal to see your prior commands.
On Tue, Feb 25, 2014 at 10:04 AM, Lorinda <lorind...@gmail.com> wrote:
Hey Tony,

I've used equivalent taxonomy mapping files for each version of the UNITE reference database.    I've attached a file containing 20 sequences that had taxon assignments in the 12_10 database but not the 09.02.14 release using all the same parameters.  Hope this helps.


On Thursday, February 20, 2014 4:18:21 PM UTC-7, Lorinda wrote:
Hello,

After running the assign_taxonomy.py,  99.8% of my sequences remain "Unassigned."  However, when I blast these sequences manually I get good species matches like the one below:

 I am running the following command on QIIME 1.8, using the latest UNITE ITS database:

assign_taxonomy.py  -i /Users/lorinda/Desktop/Q30/rep_setQ30.fna -t /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.txt -r /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.fasta -o /Users/lorinda/Desktop/Q30/UCLUST.9_97_otus_Feb

Lorinda

unread,
Feb 25, 2014, 1:48:46 PM2/25/14
to qiime...@googlegroups.com
Also,  it appears that when I assign taxonomy using the 12_10 database but with a .97 uclust similarity threshold, I seem to get about the same number of unassigned OTUs as when I use the newer UNITE databases and the default similarity threshold of .9.  


On Tuesday, February 25, 2014 10:33:50 AM UTC-7, TonyWalters wrote:

Lorinda

unread,
Feb 25, 2014, 2:04:59 PM2/25/14
to qiime...@googlegroups.com
Actually, the output file from the 12_10 database (97_otus.fast), using .97 uclust similarity is nearly identical to the output file using the 02_09 version(97_otus.fasta) with .9 uclust similarity, except that the 02_09 version has more complete species descriptions for OTUs where taxonomy has been assigned.  I'm not sure if this will help but I attached these two files for you to compare.  
UCLUST.97_12_10_rep_set.txt
UCLUST.9_02_09_rep_set.txt

Tony Walters

unread,
Feb 25, 2014, 4:56:17 PM2/25/14
to qiime...@googlegroups.com
Lorinda,

Something else that might be worth trying-can you try running ITS Extractor (http://www.emerencia.org/FungalITSextractor.html) on the input sequences and see if that improves the assignment? It's possible that the sequences have some remaining flanking regions that aren't matching the ITS database and it's interfering with the assignments.

Lorinda

unread,
Feb 26, 2014, 4:56:33 PM2/26/14
to qiime...@googlegroups.com
Cutting down my sequences to contain only ITS2 seems to have helped out quite a lot.  I am seeing many more taxon assignments.  However, I am now unable to summarize taxa through plots and keep getting the following error:

"Metadata category '%s' not in OTU %s. Can't continue. Did you pass the correct metadata identifier?" % (md_identifier,otu_id)
KeyError: u"Metadata category 'taxonomy' not in OTU SH216503.06FU_AF444373_refs. Can't continue. Did you pass the correct metadata identifier?"

I am doing everything the same as before so my only guess is that the error is in the rep_set.fasta file output from the Fungal ITS extractor(ITS2.fasta).   However, when I compare this output to the rep_set.fna output from Pick_open_ref_otus, they appear almost identical. 

Ive attached my new trimmed fasta file, along with the OTU table I've been using all along and the assign_taxonomy output. 

I'm using these commands to make the OTU table used for Summmarize_taxa_through_plots.py

assign_taxonomy.py -i /Users/lorinda/Desktop/Q30/ITS2.fasta -t /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.txt -r /Users/lorinda/Desktop/Q30/UNITE_Feb/97_otus.fasta -o /Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/ --uclust_similarity .97


biom add-metadata -i /Users/lorinda/Desktop/Q30/otu_table_Q30.biom -o /Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/otus_w_taxa.biom --observation-metadata-fp /Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/ITS2_tax_assignments.txt --sc-separated taxonomy


summarize_taxa_through_plots.py -i /Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/otus_w_taxa.biom   -o /Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/TaxaSummary -m /Users/lorinda/Desktop/Q20/merged_mappingfile2.txt -f



Thanks!
ITS2.fasta
otu_table_Q30.biom
ITS2_rep_setQ30_tax_assignments.txt

Tony Walters

unread,
Feb 26, 2014, 5:01:40 PM2/26/14
to qiime...@googlegroups.com
Lorinda,

I'm not seeing the taxonomy in the final OTU table. Can you try using the make_otu_table.py (http://qiime.org/scripts/make_otu_table.html) command using the new taxonomic assignments that were generated (/Users/lorinda/Desktop/Q30/UCLUST.97_.97_Feb/ITS2/ITS2_tax_assignments.txt) as the -t input and the original OTU mapping file (should be named something like _otus.txt in the output directory of OTU picking) as the -i input?

Lorinda

unread,
Feb 26, 2014, 5:28:47 PM2/26/14
to qiime...@googlegroups.com
That was the original OTU table before taxon assignments, here is the otu table generated after using the biom add-metadata command:
otus_w_taxa.biom

Tony Walters

unread,
Feb 26, 2014, 5:39:16 PM2/26/14
to qiime...@googlegroups.com
Lorinda, 

When I used the biom convert command on your attached table, there is no taxonomy. Can you please try the make_otu_table.py command and then do your summarize_taxa_through_plots on that?

Lorinda

unread,
Feb 26, 2014, 6:35:20 PM2/26/14
to qiime...@googlegroups.com
Ok, I tried making an OTU table.  For some reason an otu map file was never generated through my pick_open_reference_otus.py command.  I tried using my existing OTU table, converting it to an OTU matrix .txt file and using that instead(not sure if this is legit).  The summarize otus command now works, but I get a whole new category of taxonomy labeled "none"  that makes up about 90% of all my sequences.    

Tony Walters

unread,
Feb 26, 2014, 6:37:38 PM2/26/14
to qiime...@googlegroups.com
That won't work. Do you see a directory called uclust_picked_otus or usearch_picked_otus? Look for the OTU mapping file (_otus.txt) in there.

Lorinda Hunt

unread,
Feb 26, 2014, 7:09:30 PM2/26/14
to qiime...@googlegroups.com
No, I don't have it.  It was never generated.  The only UCLUST_ref_picked_otus folders  on my computer are from the QIIME test files, but I think I should be able to work around this.  My original .biom  and rep_set.fna  files generated from Pick_open_reference_otus have worked (and still do) through  the summarize_through_plots.py command.  The only difference now is that my new rep_set has been run through the Fungal_ITS_extractor, and the new, resulting, trimmed .fasta file seems to be causing problems when I try to assign taxonomy and add it to a .biom file (this becomes apparent when I try to run summarize_taxa_through_plots.py).   I've attached my original rep set and otu table, along with my  taxon assignments and otu table with taxa generated using the UNITE database.    Thanks so much for your help.  



On Feb 26, 2014, at 4:35 PM, Lorinda wrote:
ITS2.fasta
Original_OTU_table.biom copy
ITS2_tax_assignments.txt
OTUtable_w_tax.biom

Tony Walters

unread,
Feb 26, 2014, 9:51:51 PM2/26/14
to qiime...@googlegroups.com
Lorinda, 

QIIME could not have created an OTU table without the OTU mapping file, maybe it was deleted?

There are discrepancies between the assignment IDs and your OTU IDs; so the output of ITS Extractor isn't directly compatible with QIIME.

Example:
>SH000485.06FU_AY612334_reps_singleton 330_125093
in the original rep_setQ30.fna.zip file you attached.
The text after the first space is stripped off, so the OTU ID looks like this in the OTU table:
SH000485.06FU_AY612334_reps_singleton
But it's truncated in the ITS2.fasta file:
>SH000485.06FU_AY612334_reps_si
And this results in a truncated form in the taxonomy assignments file:
SH000485.06FU_AY612334_reps_si k__Fungi;p__Basidiomycota;c__Incertae_sedis;o__Malasseziales 0.67 3

The SH000485.06FU_AY612334_reps_si doesn't match up to the OTU ID SH000485.06FU_AY612334_reps_singleton, so when you call biom add-metadata,
it does not work.


The add-metadata command looked off, here's an example command (it's not going to work until the OTU IDs/taxonomy mapping IDs are sorted out though):
biom add-metadata -i /Users/tony/Downloads/Original_OTU_table.biom -o otu_table_lorinda2.biom --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp /Users/tony/Downloads/ITS2_tax_assignments_fixed.txt

Looking through the pre and post ITS extractor data, there are a few sequences lost: 6084 vs 6078 on top of the labels changing.

I modified a few lines in the FungalITSextractor.pl file-can you copy your current version of the file, and copy the attached one to your FungalITSextractor directory and try rerunning the ITS extracting step? If it preserves the labels (we really just care about
preserving everything before the first space), then can you run the resulting fasta file through taxonomic assignments, and try to rebuild the OTU table with that, 
which hopefully will have matching IDs to the OTUs?

On Wed, Feb 26, 2014 at 5:09 PM, Lorinda Hunt <lorind...@gmail.com> wrote:
No, I don't have it.  It was never generated.  The only UCLUST_ref_picked_otus folders  on my computer are from the QIIME test files, but I think I should be able to work around this.  My original .biom  and rep_set.fna  files generated from Pick_open_reference_otus have worked (and still do) through  the summarize_through_plots.py command.  The only difference now is that my new rep_set has been run through the Fungal_ITS_extractor, and the new, resulting, trimmed .fasta file seems to be causing problems when I try to assign taxonomy and add it to a .biom file (this becomes apparent when I try to run summarize_taxa_through_plots.py).   I've attached my original rep set and otu table, along with my  taxon assignments and otu table with taxa generated using the UNITE database.    Thanks so much for your help.  



On Feb 26, 2014, at 4:35 PM, Lorinda wrote:




You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/JTyOOFEfpY4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
FungalITSextractor.pl

Lorinda Hunt

unread,
Feb 27, 2014, 6:36:19 PM2/27/14
to qiime...@googlegroups.com
Hi Tony, 
 After re-running the FungalITSextractor with the new .pl file that you gave me and making a few manual edits to my otu table,  I am able to succesfully run summarize_taxa_through_plots.py.  The ITSExtractor, which eliminated everything but the ITS2 region from my sequences, also fixed the problem with the newer UNITE databases returning so many "unassigned" taxa.   Very cool program, thanks for sharing it with me.  I'm sure I will use it plenty more in the future.  

Thanks so much for your help with all this!

Lorinda

Thanks for your
<FungalITSextractor.pl>

TonyWalters

unread,
Mar 5, 2014, 9:47:50 AM3/5/14
to qiime...@googlegroups.com
As a follow up to this thread, there is a newer version of ITS Extractor (it's been renamed to ITSx), available here: http://microbiology.se/software/itsx/
which has more options, including preservation of the fasta labels.
To unsubscribe from this group and all its topics, send an email to qiime-forum+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum+unsubscribe@googlegroups.com.

Yongjie Zhang

unread,
Apr 22, 2014, 3:41:37 PM4/22/14
to qiime...@googlegroups.com
Hi, for the two releases of the same UNITE version, which one we should refer to? When I used sh_qiime_release_09.02.2014, I have 156 out of 1263 OTUs to be unassigned; when using  sh_qiime_release_s_09.02.2014, I have 85 out of 1263 to be unassigned. Which database is more reliable? Also for the 97, 99, and dynamic files, which one we should refer to?

在 2014年2月22日星期六UTC+8上午3时13分18秒,TonyWalters写道:
Reply all
Reply to author
Forward
0 new messages