make_otu_table.py -i nonchim_seqs_otus.txt -o my_table.biom -t uclust_assigned_taxa.txt
assign_taxonomy.py -i nonchim_seqs.fna -o ./assigned_taxonomy_uclust/
lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_299930;size=1; k__Bacteria; p__Acidobacteria; c__Acidobacteria-5; o__; f__; g__; s__ 1.00 3
lane1-s006-index-NNNNNNNN-AATTTAGGTAGG-S6_338769;size=1; k__Bacteria; p__Planctomycetes; c__Planctomycetia; o__Gemmatales; f__Gemmataceae; g__; s__ 1.00 3
lane1-s024-index-NNNNNNNN-CGGGACACCCGA-S26_1701270;size=1; k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Bdellovibrionales; f__Bdellovibrionaceae; g__Bdellovibrio; s__ 1.00 2
lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_81927;size=23; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Aeromonadales; f__Aeromonadaceae; g__; s__ 0.67 3
lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_43659;size=23; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__Plesiomonas; s__shigelloides 1.00 3
lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_289199;size=1; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae 1.00 3
lane1-s013-index-NNNNNNNN-ACTCGCTCGCTG-S13_1963628;size=1; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__ 1.00 3
lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16_81267;size=23; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__ 0.67 3
lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21_221665;size=1; k__Bacteria; p__Actinobacteria; c__Thermoleophilia; o__Gaiellales; f__Gaiellaceae; g__; s__ 1.00 3
lane1-s010-index-NNNNNNNN-AATTCACCTCCT-S10_1781976;size=1; k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae; g__; s__ 0.67 3
make_otu_table.py -i /home/paz/16S_closed_otus/uclust_ref_picked_otus/16S_R1_q19_seqs_chimeras_filtered_otus.txt -o /home/paz/otu_table_16S_R1_uclust_taxonomy.biom -t /home/paz/uclust_taxonomy_rep_set_16S_R1/rep_set_otus_tax_assignments.txt
head otu_table_16S_R1_uclust_taxonomy.txt
# Constructed from biom file
#OTU ID WS1 WBS1 Ni1 Ne1 S1 WS2 Ne2 WBS2 S2 WS3 Ni3 Ne3 S3 WS4 Ni4 WBS4 Ne4 S4 WS5 WBS5 Ni5 Ne5 S5 WS6 WBS6 Ni6 Ne6 S6 WS7 Ni7 Ne7 S7 B S E K Ni2 WBS3 WBS7 F
EU510700.1.1432 167.0 3.0 2.0 9.0 10.0 14.0 66.0 2.0 77.0 478.0 8.0 2.0 36.0 33.0 23.0 6.0 15.0 26.0 1126.0 8.0 38.0 3.0 1.0 405.0 4.0 10.0 3.0 4.0 238.0 35.0 30.0 171.0 25.0 29.0 1.0 3.0 0.0 0.0 0.0 0.0
EU776957.1.1408 7.0 6.0 25.0 21.0 12.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 2.0 8.0 14.0 0.0 70.0 0.0 4.0 38.0 0.0 68.0 17.0 27.064.0 0.0 22.0 20.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
DQ223087.1.1346 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 29.0 0.0 0.0 0.0 0.0 0.0 0.0
JQ248106.1.1504 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 5.0 0.0 0.0 0.0 0.0 18.0 0.0 0.0 0.0 0.0 6.0 0.0 13.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
GQ284412.1.1494 0.0 0.0 0.0 0.0 0.0 3.0 1.0 0.0 0.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 4.0 1.0 2.0 0.0 0.0 2.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
EU535701.1.1402 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
EU472724.1.1401 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 16.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
DL460096.1.1445 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Next I used summarize_taxa.py using that otu table as an input:
summarize_taxa.py -i /home/paz/otu_table_16S_R1_uclust_taxonomy.biom -o /home/paz/taxonomy_tables_16S_R1
And this is how the taxonomy table for level 5 looks like :
head otu_table_16S_R1_uclust_taxonomy_L5.txt
# Constructed from biom file
#OTU ID WS1 WBS1 Ni1 Ne1 S1 WS2 Ne2 WBS2 S2 WS3 Ni3 Ne3 S3 WS4 Ni4 WBS4 Ne4 S4 WS5 WBS5 Ni5 Ne5 S5 WS6 WBS6 Ni6 Ne6 S6 WS7 Ni7 Ne7 S7 B S E K Ni2 WBS3 WBS7 F
D_0__Archaea;D_1__Thaumarchaeota;D_2__Marine Group I;D_3__Unknown Order;D_4__Unknown Family 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000207569362762 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000120617561917 0.0 0.0 0.0 0.0 0.0
D_0__Archaea;D_1__Thaumarchaeota;D_2__Sc-EA05;D_3__uncultured archaeon;D_4__uncultured archaeon 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.04117079447e-05 0.0 0.0 0.0 0.0 0.0
D_0__Archaea;D_1__Thaumarchaeota;D_2__South African Gold Mine Gp 1(SAGMCG-1);D_3__uncultured archaeon;D_4__uncultured archaeon 0.0 0.0 0.0 0.0 3.28964682352e-06 0.0 0.0 0.0 2.34899854362e-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.62532288032e-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000115316312646 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000155183116077 0.0 0.0 0.0 0.0
D_0__Archaea;D_1__Woesearchaeota (DHVEG-6);D_2__uncultured euryarchaeote;D_3__uncultured euryarchaeote;D_4__uncultured euryarchaeote 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.56599902908e-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Acidobacteriales;D_4__Acidobacteriaceae (Subgroup 1) 0.0128835792926 1.13096584483e-05 0.000170677590032 3.86067592714e-06 0.000384888678351 0.000475674030101 1.61714954752e-05 0.0 4.69799708724e-05 0.000151082759778 0.0 0.0 6.76818950931e-05 2.70874246631e-05 0.0 7.26585773451e-06 5.34305056129e-06 0.0 3.39374193986e-05 0.0 0.0 0.0 4.8291215345e-06 7.79030574353e-05 0.0 0.0 0.0 0.0 8.9253837915e-05 0.0 0.0 8.33680700292e-05 0.0 0.0 0.0 0.0 4.14490591064e-05 0.0 0.0 0.0
D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 3;D_4__SJA-149 0.0 2.26193168966e-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 3;D_4__Unknown Family 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.34899854362e-05 0.0 0.0 0.0 0.0 0.0 0.0 2.17975732035e-05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
D_0__Bacteria;D_1__Acidobacteria;D_2__Acidobacteria;D_3__Subgroup 4;D_4__11-24 0.0 0.0 0.0 0.0 0.0 6.34232040134e-06 0.0 0.000188729362095 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000140706345856 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00088122980515
The resulting taxonomy tables have less OTUS than the inicial OTU table (and I don't really understand why) but the ones that appear have taxonomy assigned.
Hope this helps!
All the best,
Paz
head paired_end_L6.txt
# Constructed from biom file
#OTU ID lane1-s021-index-NNNNNNNN-CACGTTTATTCC-S22 lane1-s009-index-NNNNNNNN-ACTACTGAGGAT-S9 lane1-s024-index-NNNNNNNN-CGGGACACCCGA-S26 lane1-s023-index-NNNNNNNN-TGACTAATGGCC-S25 lane1-s015-index-NNNNNNNN-CGTCCGTATGAA-S16 lane1-s022-index-NNNNNNNN-TAATCGGTGCCA-S24 lane1-s003-index-NNNNNNNN-CGCGCCTTAAAC-S3 lane1-s019-index-NNNNNNNN-GCCTATGAGATC-S20S21 lane1-s018-index-NNNNNNNN-CATATAGCCCGA-S19 lane1-s013-index-NNNNNNNN-ACTCGCTCGCTG-S13 lane1-s011-index-NNNNNNNN-CGTATAAATGCG-S11 lane1-s007-index-NNNNNNNN-GACTCAACCAGT-S7 lane1-s010-index-NNNNNNNN-AATTCACCTCCT-S10 lane1-s005-index-NNNNNNNN-TACAATATCTGT-S5 lane1-s001-index-NNNNNNNN-AGCCTTCGTCGC-S1 lane1-s006-index-NNNNNNNN-AATTTAGGTAGG-S6 lane1-s012-index-NNNNNNNN-ATGCTGCAACAC-S12 lane1-s014-index-NNNNNNNN-TTCCTTAGTAGT-S14 lane1-s017-index-NNNNNNNN-GGTTGCCCTGTA-S18 lane1-s008-index-NNNNNNNN-GCCTCTACGTCG-S8 lane1-s004-index-NNNNNNNN-TATGGTACCCAG-S4 lane1-s002-index-NNNNNNNN-TCCATACCGGAA-S2 lane1-s016-index-NNNNNNNN-ACGTGAGGAACG-S17None;Other;Other;Other;Other;Other 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Probably a good example of why I shouldn't be writing my whole pipeline in a bash script