Dear Qiime Users,
I have paired-ends reads of V1-V3 16S regions. Sequence analysis process includes assembling (using PEAR), removing primers (using multiple_extract_barcodes.py in Qiime 1.9.1), filtering and truncate seqs to 240 bp (following UPARSE pipeline).
When doing taxonomic assignment with Greengenes database (gg_13_8_otus) and Silva_119 in Qiime virtual box (v.1.9.1), I found a significant difference in taxonomic assignment for Proteobacteria as below.
Commands:
1) assign_taxonomy.py -i otus.fa
2) assign_taxonomy.py -i otus.fa -r ../../../Downloads/Silva119_release/rep_set/97/Silva_119_rep_set97.fna -t ../../../Downloads/Silva119_release/taxonomy/97/taxonomy_97_7_levels.txt -o Silva_tax_assign/)
For examples:
Using Greengenes: Sample1 Sample2 Sample3 Sample4
| k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales | 10.506 | 19.796 | 8.937 | 16.577 |
| k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;Other | 16.310 | 20.642 | 16.145 | 1.233 |
| |
|
|
|
| k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales | 1.764 | 40.493 | 3.217 | 19.899 |
|
|
|
|
|
| k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales | 19.250 | 0.351 | 23.148 | 3.065 |
| k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;Other | 0.637 | 0.017 | 0.467 | 0.006 |
Using Silva database:
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rhizobiales 26.41278 39.74114 24.52475 16.96269
D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;Other 0.304246 0.621684 0.387718 0.8775022
D_0__Bacteria;D_1__Proteobacteria;D_2__Betaproteobacteria;D_3__Burkholderiales 14.31116 40.46679 17.30375 7.773602
D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Xanthomonadales 8.102449 0.357522 10.28378 2.75844
D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;Other 0.035862 0.111394 0.017779 0.101754
It seems that the seqs assigned to "Alphaproteobacteria;_Other" with Greengenes was assigned to "Alphaproteobacteria;_Rhizobiales" when using Silva.
And, a large portion of seqs assigned to "Gammaproteobacteria;_Xanthomonadales "with Greengenes was moved to "Betaproteobacteria;_Burkhoderiales" when using Silva.
Could you please advise me what are possible reasons for this difference? which classification is better in this case?
I like the classification using Greengenes database, except for high portion of seqs was assigned to "Alphaproteobacteria;_Other". Can I based on classification when using Silva database to move this portion to "Alphaproteobacteria;_Rhizobiales"?
Please advise a source or references where I can read to understand the structure of these database (it is hard to understand for me).
Thank you very much!
I am looking forward to your advice!
Yours sincerely,
An