Help me figure out why make_otu_table.py is not adding taxonomy...

55 views
Skip to first unread message

Jessica Hardwicke

unread,
Nov 30, 2016, 1:24:59 PM11/30/16
to Qiime 1 Forum

For some reason when I build my biom table, even though I have taxonomy information, when I run the command: 

make_otu_table.py -i nonchim_seqs_otus.txt -o my_table.biom -t uclust_assigned_taxa.txt


The table is created with no taxa information. I've confirmed this with phyloseq, by trying to filter OTUs by taxonomy with qiime, and by converting the biom to a table with the --header-key taxonomy flag. There's no error to tell me what happened. 

My uclust_assigned_taxa.txt was generated by:

assign_taxonomy.py -i nonchim_seqs.fna -o ./assigned_taxonomy_uclust/

And the output looks like:


lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_299930;size=1;    k__Bacteria; p__Acidobacteria; c__Acidobacteria-5; o__; f__; g__; s__   1.00    3


lane1-s006-index-NNNNNNNN-AATTTAGGTAGG-S6_338769;size=1;        k__Bacteria; p__Planctomycetes; c__Planctomycetia; o__Gemmatales; f__Gemmataceae; g__; s__      1.00    3


lane1-s024-index-NNNNNNNN-CGGGACACCCGA-S26_1701270;size=1;      k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Bdellovibrionales; f__Bdellovibrionaceae; g__Bdellovibrio; s__       1.00    2


lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_81927;size=23;       k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Aeromonadales; f__Aeromonadaceae; g__; s__   0.67    3


lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_43659;size=23;       k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__Plesiomonas; s__shigelloides    1.00    3


lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_289199;size=1;    k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae     1.00    3


lane1-s013-index-NNNNNNNN-ACTCGCTCGCTG-S13_1963628;size=1;      k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__   1.00    3


lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_81267;size=23;       k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__   0.67    3


lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_221665;size=1;    k__Bacteria; p__Actinobacteria; c__Thermoleophilia; o__Gaiellales; f__Gaiellaceae; g__; s__     1.00    3


lane1-s010-index-NNNNNNNN-AATTCACCTCCT-S10_1781976;size=1;      k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__   0.67    3


So what's the problem? I've tried multiple versions of qiime, I've tried using the biom package to add this metadata back in (issues with the header in the file, I haven't attempted reformat the taxonomy info myself yet). I haven't had this same problem when I use one an "all-in-one" script like pick_open_reference_otus.py. 

Paz Aranega

unread,
Dec 2, 2016, 6:17:26 AM12/2/16
to Qiime 1 Forum
Hi Jessica,

First of all, I don't have much experience using qiime so I might be completely wrong but I think that is ok. The same thing happened to me and I was very confused. This is how the output from make_otu_table.py looks like:

make_otu_table.py -i /home/paz/16S_closed_otus/uclust_ref_picked_otus/16S_R1_q19_seqs_chimeras_filtered_otus.txt -o /home/paz/otu_table_16S_R1_uclust_taxonomy.biom -t /home/paz/uclust_taxonomy_rep_set_16S_R1/rep_set_otus_tax_assignments.txt


 head otu_table_16S_R1_uclust_taxonomy.txt


# Constructed from biom file

#OTU ID WS1     WBS1    Ni1     Ne1     S1      WS2     Ne2     WBS2    S2      WS3     Ni3     Ne3     S3      WS4     Ni4     WBS4    Ne4     S4      WS5     WBS5    Ni5     Ne5     S5      WS6     WBS6    Ni6     Ne6     S6      WS7 Ni7      Ne7     S7      B       S       E       K       Ni2     WBS3    WBS7    F

EU510700.1.1432 167.0   3.0     2.0     9.0     10.0    14.0    66.0    2.0     77.0    478.0   8.0     2.0     36.0    33.0    23.0    6.0     15.0    26.0    1126.0  8.0     38.0    3.0     1.0     405.0   4.0     10.0    3.0     4.0 238.0    35.0    30.0    171.0   25.0    29.0    1.0     3.0     0.0     0.0     0.0     0.0

EU776957.1.1408 7.0     6.0     25.0    21.0    12.0    0.0     0.0     0.0     6.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     2.0     8.0     14.0    0.0     70.0    0.0     4.0     38.0    0.0     68.0    17.0    27.064.0     0.0     22.0    20.0    0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0

DQ223087.1.1346 0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     29.0    0.0     0.0     0.0     0.0     0.0     0.0

JQ248106.1.1504 5.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     2.0     5.0     0.0     0.0     0.0     0.0     18.0    0.0     0.0     0.0     0.0 6.0      0.0     13.0    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

GQ284412.1.1494 0.0     0.0     0.0     0.0     0.0     3.0     1.0     0.0     0.0     0.0     0.0     6.0     0.0     0.0     0.0     0.0     4.0     1.0     2.0     0.0     0.0     2.0     0.0     1.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0

EU535701.1.1402 0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     1.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

EU472724.1.1401 0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0 1.0      0.0     0.0     0.0     16.0    0.0     1.0     0.0     0.0     0.0     0.0     0.0

DL460096.1.1445 0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0



Next I used summarize_taxa.py using that otu table as an input:



summarize_taxa.py -i /home/paz/otu_table_16S_R1_uclust_taxonomy.biom -o /home/paz/taxonomy_tables_16S_R1



And this is how the taxonomy table for level 5 looks like :


 head  otu_table_16S_R1_uclust_taxonomy_L5.txt



# Constructed from biom file

#OTU ID WS1     WBS1    Ni1     Ne1     S1      WS2     Ne2     WBS2    S2      WS3     Ni3     Ne3     S3      WS4     Ni4     WBS4    Ne4     S4      WS5     WBS5    Ni5     Ne5     S5      WS6     WBS6    Ni6     Ne6     S6      WS7 Ni7      Ne7     S7      B       S       E       K       Ni2     WBS3    WBS7    F

D_0__Archaea;D_1__Thaumarchaeota;D_2__Marine Group I;D_3__Unknown Order;D_4__Unknown Family     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.000207569362762       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.000120617561917       0.0     0.0     0.0     0.0     0.0

D_0__Archaea;D_1__Thaumarchaeota;D_2__Sc-EA05;D_3__uncultured archaeon;D_4__uncultured archaeon 0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     8.04117079447e-05       0.0     0.0     0.0     0.0     0.0

D_0__Archaea;D_1__Thaumarchaeota;D_2__South African Gold Mine Gp 1(SAGMCG-1);D_3__uncultured archaeon;D_4__uncultured archaeon  0.0     0.0     0.0     0.0     3.28964682352e-06       0.0     0.0     0.0     2.34899854362e-05       0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     3.62532288032e-05       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.000115316312646       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.000155183116077    0.0     0.0     0.0     0.0

D_0__Archaea;D_1__Woesearchaeota (DHVEG-6);D_2__uncultured euryarchaeote;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.56599902908e-05       0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Acidobacteriales;D_4__Acidobacteriaceae (Subgroup 1)   0.0128835792926 1.13096584483e-05       0.000170677590032       3.86067592714e-06       0.000384888678351       0.000475674030101    1.61714954752e-05       0.0     4.69799708724e-05       0.000151082759778       0.0     0.0     6.76818950931e-05       2.70874246631e-05       0.0     7.26585773451e-06       5.34305056129e-06       0.0     3.39374193986e-05   0.0      0.0     0.0     4.8291215345e-06        7.79030574353e-05       0.0     0.0     0.0     0.0     8.9253837915e-05        0.0     0.0     8.33680700292e-05       0.0     0.0     0.0     0.0     4.14490591064e-05       0.0     0.0 0.0

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 3;D_4__SJA-149        0.0     2.26193168966e-05       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 3;D_4__Unknown Family 0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     2.34899854362e-05       0.0     0.0     0.0     0.0     0.0     0.0     2.17975732035e-05    0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0

D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 4;D_4__11-24  0.0     0.0     0.0     0.0     0.0     6.34232040134e-06       0.0     0.000188729362095       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.000140706345856       0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.00088122980515



The resulting taxonomy tables have less OTUS than the inicial OTU table (and I don't really understand why) but the ones that appear have taxonomy assigned. 


Hope this helps!


All the best,


Paz


Jessica Hardwicke

unread,
Dec 2, 2016, 1:45:26 PM12/2/16
to Qiime 1 Forum
Paz, 

Thanks for the input! I ran summarize_taxa.py and did not get the same kind of output as you: 
head paired_end_L6.txt 
# Constructed from biom file
#OTU ID lane1-s021-index-NNNNNNNN-CACGTTTATTCC-S22      lane1-s009-index-NNNNNNNN-ACTACTGAGGAT-S9       lane1-s024-
index-NNNNNNNN-CGGGACACCCGA-S26 lane1-s023-index-NNNNNNNN-TGACTAATGGCC-S25      lane1-s015-index-NNNNNNNN-CGTCCGTAT
GAA-S16 lane1-s022-index-NNNNNNNN-TAATCGGTGCCA-S24      lane1-s003-index-NNNNNNNN-CGCGCCTTAAAC-S3       lane1-s019-
index-NNNNNNNN-GCCTATGAGATC-S20S21      lane1-s018-index-NNNNNNNN-CATATAGCCCGA-S19      lane1-s013-index-NNNNNNNN-A
CTCGCTCGCTG-S13 lane1-s011-index-NNNNNNNN-CGTATAAATGCG-S11      lane1-s007-index-NNNNNNNN-GACTCAACCAGT-S7       lan
e1-s010-index-NNNNNNNN-AATTCACCTCCT-S10 lane1-s005-index-NNNNNNNN-TACAATATCTGT-S5       lane1-s001-index-NNNNNNNN-A
GCCTTCGTCGC-S1  lane1-s006-index-NNNNNNNN-AATTTAGGTAGG-S6       lane1-s012-index-NNNNNNNN-ATGCTGCAACAC-S12      lan
e1-s014-index-NNNNNNNN-TTCCTTAGTAGT-S14 lane1-s017-index-NNNNNNNN-GGTTGCCCTGTA-S18      lane1-s008-index-NNNNNNNN-G
CCTCTACGTCG-S8  lane1-s004-index-NNNNNNNN-TATGGTACCCAG-S4       lane1-s002-index-NNNNNNNN-TCCATACCGGAA-S2       lan
e1-s016-index-NNNNNNNN-ACGTGAGGAACG-S17
None;Other;Other;Other;Other;Other      1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
        1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0

It seems like my taxonomy info never got into the original biom table :/

Colin Brislawn

unread,
Dec 2, 2016, 5:28:29 PM12/2/16
to Qiime 1 Forum
Hello Jessica,

So this command is able to make a .biom table, but that table does not have any taxonomy in it?
make_otu_table.py -i nonchim_seqs_otus.txt -o my_table.biom -t uclust_assigned_taxa.txt

You could try adding the taxonomy to the already created table using this command:
biom add-metadata -i my_table.biom -o my_table_w_tax.biom --observation-metadata-fp uclust_assigned_taxa.txt
I'm not sure why the make_otu_table.py script is failing to do this in a single step, but hopefully the biom add-metadata command will work. 

Let me know how well this works for you,
Colin

Jessica Hardwicke

unread,
Dec 8, 2016, 2:42:47 PM12/8/16
to Qiime 1 Forum
Sorry to disappear! Turns out you'd need to see my entire pipeline to find out why it wasn't working - I was using my (chimera/derep)filtered sequences but not my representative set of sequences (from pick_rep_set.py) to input into assing_taxonomy. Probably a good example of why I shouldn't be writing my whole pipeline in a bash script (at least in the simple format I use), but I haven't made the time to learn a better pipeline tool. 

Cheers! 

Colin Brislawn

unread,
Dec 8, 2016, 3:19:15 PM12/8/16
to Qiime 1 Forum
Got it! Glad you got it working.

Probably a good example of why I shouldn't be writing my whole pipeline in a bash script
I think BASH is the perfect place to start. A jupyter notebook may be a reasonable step up.

I strongly believe that making a qiime pipeline is a sort of right-of-passage in this field. Here are some:


As with any right-of-passage, it's about the journey, not the destination. When you feel ready, I would be honored to add your pipeline to the list.

Colin

Jessica Hardwicke

unread,
Dec 8, 2016, 3:39:53 PM12/8/16
to Qiime 1 Forum
Nice! I like the jupyter suggestion and I might follow through with that one so I can get more practice on that platform. 

Off topic but I noticed that your bitbucket pipeline uses usearch. A lot of the help I've found online for implementing swarm/vsearch has included discussions from you! Do you just not have a pipeline online using those tools? 

Colin Brislawn

unread,
Dec 8, 2016, 4:38:57 PM12/8/16
to Qiime 1 Forum
Hello Jessica,

I've moved to vsearch because it's open source and the devs are really active in the community. Here is a much newer variation of a 'qiime pipeline' which makes use of vsearch:

I hope all is well is Portland!
Colin

Reply all
Reply to author
Forward
0 new messages