Difference in taxonomic assignment: Greengenes vs. SILVA

baoanxh2006

unread,

Mar 27, 2016, 10:48:27 PM3/27/16

to Qiime 1 Forum

Dear Qiime Users,

I have paired-ends reads of V1-V3 16S regions. Sequence analysis process includes assembling (using PEAR), removing primers (using multiple_extract_barcodes.py in Qiime 1.9.1), filtering and truncate seqs to 240 bp (following UPARSE pipeline).

When doing taxonomic assignment with Greengenes database (gg_13_8_otus) and Silva_119 in Qiime virtual box (v.1.9.1), I found a significant difference in taxonomic assignment for Proteobacteria as below.

Commands:

1) assign_taxonomy.py -i otus.fa

2) assign_taxonomy.py -i otus.fa -r ../../../Downloads/Silva119_release/rep_set/97/Silva_119_rep_set97.fna -t ../../../Downloads/Silva119_release/taxonomy/97/taxonomy_97_7_levels.txt -o Silva_tax_assign/)

For examples:

Using Greengenes: Sample1 Sample2 Sample3 Sample4

k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales	10.506	19.796	8.937	16.577
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;Other	16.310	20.642	16.145	1.233

k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales	1.764	40.493	3.217	19.899

k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Xanthomonadales	19.250	0.351	23.148	3.065
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;Other	0.637	0.017	0.467	0.006

Using Silva database:

D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rhizobiales 26.41278 39.74114 24.52475 16.96269

D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;Other 0.304246 0.621684 0.387718 0.8775022

D_0__Bacteria;D_1__Proteobacteria;D_2__Betaproteobacteria;D_3__Burkholderiales 14.31116 40.46679 17.30375 7.773602

D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Xanthomonadales 8.102449 0.357522 10.28378 2.75844

D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;Other 0.035862 0.111394 0.017779 0.101754

It seems that the seqs assigned to "Alphaproteobacteria;_Other" with Greengenes was assigned to "Alphaproteobacteria;_Rhizobiales" when using Silva.

And, a large portion of seqs assigned to "Gammaproteobacteria;_Xanthomonadales "with Greengenes was moved to "Betaproteobacteria;_Burkhoderiales" when using Silva.

Could you please advise me what are possible reasons for this difference? which classification is better in this case?

I like the classification using Greengenes database, except for high portion of seqs was assigned to "Alphaproteobacteria;_Other". Can I based on classification when using Silva database to move this portion to "Alphaproteobacteria;_Rhizobiales"?

Please advise a source or references where I can read to understand the structure of these database (it is hard to understand for me).

Thank you very much!

I am looking forward to your advice!

Yours sincerely,

An

Jenya Kopylov

unread,

Mar 28, 2016, 10:54:57 AM3/28/16

to Qiime 1 Forum

Hi An,

Likely some abundant OTUs are matching different reference sequences in Greengenes vs. Silva with high %id (>90%) and both answers can be correct (which is a limitation of short reads against a very conservative 16S region).

However, you can do more in-depth investigation to verify or disprove this (thanks Tony!):

1. create a filtered OTU table with just the taxa in question (filter_taxa_from_otu_table.py, see this example)

2. convert the filtered OTU table to tab-delimited format

$ biom convert -i filtered_otutable.biom --to-tsv --table-type="OTU table" -o filtered_otutable.txt --header-key taxonomy

3. sort the table by abundance in Excel

4. use the sorted table to see which OTUs/taxa are systematically different and use the OTU ID to query representative sequences (from 97_otus.fasta greengenes or Silva_119_rep_set97.fna Silva)

$ grep "OTU ID" -A 1 97_otus.fasta

5. blasts this reference sequence on NCBI to see which results are more accurate between the two reference databases

It would also be interesting to see if your results better converge between the two databases if you set "--similarity 0.97" (rather than default --similarity 0.90) or even --similarity 0.99.

Let me know if you need more details regarding the steps listed above.

Jenya

baoanxh2006

unread,

Mar 29, 2016, 4:57:28 AM3/29/16

to Qiime 1 Forum

Thank you very much! Jenya

I will give it a try and see.

Kind regards,

An

Reply all

Reply to author

Forward