Why are there differences between OTU ID and taxonomy in my OTU table

47 views
Skip to first unread message

Émilie Tremblay

unread,
Apr 24, 2017, 1:25:11 PM4/24/17
to Qiime 1 Forum
Hi,
I generated an OTU table with a custom made database.

There are differences between the OTU ID and the taxonomy and I wonder what is going on, and which one has the "right" identification.

(See table below)

In the #OTU ID column, there are some Identification (salixsoil, cryptogea, etc.) , some weird names (New.ReferenceOTU18 ??) and the taxonomy column associated says no blast hit (but sometimes there also are some identification up to the species in the taxonomy column)

can someone help me here, I don't understand what the table columns mean?
Thanks


# Constructed from biom file
#OTU ID DRS06 DRS15 DRS01 DRS30 DRS48 DRS49 DRS04 DRS05 DRS17 DRS44 DRS32 DRS22 DRS28 DRS03 DRS33 DRS34 DRS14 DRS29 DRS45 DRS27 DRS13 DRS36 DRS35 DRS20 DRS43 DRS21 DRS19 DRS07 DRS08 DRS31 DRS02 DRS38 DRS09 DRS51 DRS37 DRS47 DRS12 DRS46 DRS39 taxonomy
PGCHLAMYDO 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No blast hit
PINICITRICOLA1 0 136 40 23 63 8 4 10 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No blast hit
CRYPTOGEA 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No blast hit
SALIXSOIL 1118 147 3 32 175 124 162 25 20 120 934 2276 144 301 44 29 505 5 346 81 145 376 92 135 196 312 138 22 16 12 10 1 3 0 0 0 0 0 0 No blast hit
LAGOARIANALIKE 14 44 177 36 193 274 47 9 1 772 0 10 137 153 70 815 4 214 1241 66 14 31 211 50 597 5 98 232 52 531 472 0 47 1907 512 129 51 81 0 No blast hit
IRRIGATA 0 0 0 0 0 0 0 0 0 0 0 0 0 9 36 58 0 0 0 0 0 4 32 12 0 0 10 1106 874 0 19 0 16 0 0 43 172 191 1477 No blast hit
PLURIVORA 17 0 0 43 4 0 8 0 6 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No blast hit
New.ReferenceOTU18 2 0 0 0 4 1 0 6 0 1 0 9 0 0 1 0 0 0 0 3 0 0 0 1 0 12 0 1 3 0 0 0 0 0 0 0 0 0 0 No blast hit
New.ReferenceOTU19 611 98 2 7 95 69 102 19 7 65 463 1581 55 216 62 19 231 3 140 62 122 142 69 82 117 194 53 8 12 3 3 1 1 0 0 0 0 0 1 No blast hit
New.ReferenceOTU12 0 0 4 0 2 9 1 0 0 3 0 0 1 1 2 6 0 4 1 1 0 0 0 1 3 0 0 1 1 0 5 0 0 6 6 0 1 0 0 No blast hit
New.ReferenceOTU13 0 0 8 0 3 15 0 1 1 15 0 0 5 3 1 9 2 6 25 4 2 3 6 2 17 0 4 0 2 8 13 0 0 14 11 2 1 3 0 No blast hit
New.ReferenceOTU10 1 10 50 6 46 74 10 5 0 227 0 2 24 45 16 144 2 58 257 12 2 5 42 13 124 0 24 57 9 101 112 0 15 489 130 36 13 21 0 No blast hit

Greg Caporaso

unread,
Apr 25, 2017, 5:39:32 PM4/25/17
to Qiime 1 Forum
Hello,
The first column of that table contains the OTU ids, and the last column contains the taxonomy assigned by BLAST. The OTU ids will either be ids from your reference database - I'm guessing that's what 'PGCHLAMYDO',  'PINICITRICOLA1', ... are) - or new OTU ids defined during the open-reference OTU picking process (described in Rideout et al (2014)). The BLAST assignments will be made during the taxonomy assignment step of the pick_open_reference_otus.py workflow, independently of the OTU assignment step. If you're concerned that many of the sequences are not having taxonomy assigned with BLAST, you can try to relax some of the parameters (for example, increase the --blast_e_value value, though its default is already pretty high), or try assignment with the RDP classifier or uclust. These options could all be achieved using the assign_taxonomy.py script directly, or by setting parameters for assign_taxonomy.py in a parameters file that you pass to pick_open_reference_otus.py

Hope this helps! 

Greg

Émilie Tremblay

unread,
Apr 25, 2017, 7:09:45 PM4/25/17
to Qiime 1 Forum
Hi Greg and thanks for your response.
I am not too concerned about the "no blast hits" knowing my data, but I just want to validate that I should use the taxonomy column for my downstream analyses?
Like, the species found in my samples, I must use the taxonomy column information, right?
I am just unsure of what "useful" information the OTU ID column gives me... I must be misunderstanding something?
Thanks.

Jai Ram Rideout

unread,
Apr 26, 2017, 6:20:54 PM4/26/17
to Qiime 1 Forum
Hello,

The "OTU ID" column contains identifiers for each OTU found in your dataset during the OTU picking step. Each OTU has a taxonomic annotation associated with it (the "taxonomy" column), and more than one OTU can have the same taxonomic classification. For downstream taxonomic analyses (e.g. summarize_taxa.py), QIIME will use the "taxonomy" column to analyze your data at specific taxonomic levels (e.g. species level).

Here's an example of what a typical .biom file has in its "OTU ID" and "taxonomy" columns:

# Constructed from biom file
#OTU ID PC.636  PC.481  PC.354  PC.635  PC.593  PC.356  PC.355  PC.607  PC.634  taxonomy
denovo0
1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     Unassigned
denovo1
1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus; s__
denovo2
0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
denovo3
0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae
denovo4
0.0     1.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
...

In this example, de novo OTU picking was used, so the OTU IDs are generated by QIIME ("denovo0", "denovo1", etc). The "taxonomy" column contains taxonomic annotations assigned by assign_taxonomy.py using a reference database (Greengenes in this example).

The OTU IDs are determined by the OTU picking method you choose, as well as your reference sequence IDs. It looks like you're using open-reference OTU picking, which uses a combination of closed-reference and de novo OTU picking to produce the final set of OTUs. Thus, your OTU IDs will consist of reference sequence IDs ( "PGCHLAMYDO", "LAGOARIANALIKE", etc), as well as IDs generated by QIIME ("New.ReferenceOTU18", "New.ReferenceOTU19", etc). The taxonomic annotations assigned to each OTU come from your custom reference database's "ID-to-taxonomy mapping file" that is provided to assign_taxonomy.py. It looks like you're using BLAST as the taxonomy assignment method and aren't getting successful classification (indicated by "No blast hit").

For a description of the open-reference OTU picking method implemented in QIIME 1 (i.e. the subsampled approach), see this paper. Since you have a custom reference database, it might also be useful to look through some of the existing QIIME-compatible reference databases for examples of how the files should be formatted.

Does this clarify where the values in these columns are coming from?

Best,
Jai
Reply all
Reply to author
Forward
0 new messages