Why are there differences between OTU ID and taxonomy in my OTU table

Émilie Tremblay

unread,

Apr 24, 2017, 1:25:11 PM4/24/17

to Qiime 1 Forum

Hi,

I generated an OTU table with a custom made database.

There are differences between the OTU ID and the taxonomy and I wonder what is going on, and which one has the "right" identification.

(See table below)

In the #OTU ID column, there are some Identification (salixsoil, cryptogea, etc.) , some weird names (New.ReferenceOTU18 ??) and the taxonomy column associated says no blast hit (but sometimes there also are some identification up to the species in the taxonomy column)

can someone help me here, I don't understand what the table columns mean?

Thanks

# Constructed from biom file
#OTU ID	DRS06	DRS15	DRS01	DRS30	DRS48	DRS49	DRS04	DRS05	DRS17	DRS44	DRS32	DRS22	DRS28	DRS03	DRS33	DRS34	DRS14	DRS29	DRS45	DRS27	DRS13	DRS36	DRS35	DRS20	DRS43	DRS21	DRS19	DRS07	DRS08	DRS31	DRS02	DRS38	DRS09	DRS51	DRS37	DRS47	DRS12	DRS46	DRS39	taxonomy
PGCHLAMYDO	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	No blast hit
PINICITRICOLA1	0	136	40	23	63	8	4	10	2	3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	No blast hit
CRYPTOGEA	0	0	0	0	0	0	6	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	No blast hit
SALIXSOIL	1118	147	3	32	175	124	162	25	20	120	934	2276	144	301	44	29	505	5	346	81	145	376	92	135	196	312	138	22	16	12	10	1	3	0	0	0	0	0	0	No blast hit
LAGOARIANALIKE	14	44	177	36	193	274	47	9	1	772	0	10	137	153	70	815	4	214	1241	66	14	31	211	50	597	5	98	232	52	531	472	0	47	1907	512	129	51	81	0	No blast hit
IRRIGATA	0	0	0	0	0	0	0	0	0	0	0	0	0	9	36	58	0	0	0	0	0	4	32	12	0	0	10	1106	874	0	19	0	16	0	0	43	172	191	1477	No blast hit
PLURIVORA	17	0	0	43	4	0	8	0	6	0	3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	No blast hit
New.ReferenceOTU18	2	0	0	0	4	1	0	6	0	1	0	9	0	0	1	0	0	0	0	3	0	0	0	1	0	12	0	1	3	0	0	0	0	0	0	0	0	0	0	No blast hit
New.ReferenceOTU19	611	98	2	7	95	69	102	19	7	65	463	1581	55	216	62	19	231	3	140	62	122	142	69	82	117	194	53	8	12	3	3	1	1	0	0	0	0	0	1	No blast hit
New.ReferenceOTU12	0	0	4	0	2	9	1	0	0	3	0	0	1	1	2	6	0	4	1	1	0	0	0	1	3	0	0	1	1	0	5	0	0	6	6	0	1	0	0	No blast hit
New.ReferenceOTU13	0	0	8	0	3	15	0	1	1	15	0	0	5	3	1	9	2	6	25	4	2	3	6	2	17	0	4	0	2	8	13	0	0	14	11	2	1	3	0	No blast hit
New.ReferenceOTU10	1	10	50	6	46	74	10	5	0	227	0	2	24	45	16	144	2	58	257	12	2	5	42	13	124	0	24	57	9	101	112	0	15	489	130	36	13	21	0	No blast hit

Greg Caporaso

unread,

Apr 25, 2017, 5:39:32 PM4/25/17

to Qiime 1 Forum

Hello,

The first column of that table contains the OTU ids, and the last column contains the taxonomy assigned by BLAST. The OTU ids will either be ids from your reference database - I'm guessing that's what 'PGCHLAMYDO', 'PINICITRICOLA1', ... are) - or new OTU ids defined during the open-reference OTU picking process (described in Rideout et al (2014)). The BLAST assignments will be made during the taxonomy assignment step of the pick_open_reference_otus.py workflow, independently of the OTU assignment step. If you're concerned that many of the sequences are not having taxonomy assigned with BLAST, you can try to relax some of the parameters (for example, increase the --blast_e_value value, though its default is already pretty high), or try assignment with the RDP classifier or uclust. These options could all be achieved using the assign_taxonomy.py script directly, or by setting parameters for assign_taxonomy.py in a parameters file that you pass to pick_open_reference_otus.py.

Hope this helps!

Greg

Émilie Tremblay

unread,

Apr 25, 2017, 7:09:45 PM4/25/17

to Qiime 1 Forum

Hi Greg and thanks for your response.

I am not too concerned about the "no blast hits" knowing my data, but I just want to validate that I should use the taxonomy column for my downstream analyses?

Like, the species found in my samples, I must use the taxonomy column information, right?

I am just unsure of what "useful" information the OTU ID column gives me... I must be misunderstanding something?

Thanks.

Jai Ram Rideout

unread,

Apr 26, 2017, 6:20:54 PM4/26/17

to Qiime 1 Forum

Hello,

The "OTU ID" column contains identifiers for each OTU found in your dataset during the OTU picking step. Each OTU has a taxonomic annotation associated with it (the "taxonomy" column), and more than one OTU can have the same taxonomic classification. For downstream taxonomic analyses (e.g. summarize_taxa.py), QIIME will use the "taxonomy" column to analyze your data at specific taxonomic levels (e.g. species level).

Here's an example of what a typical .biom file has in its "OTU ID" and "taxonomy" columns:

# Constructed from biom file

#OTU ID PC.636  PC.481  PC.354  PC.635  PC.593  PC.356  PC.355  PC.607  PC.634  taxonomy
denovo0 1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     Unassigned
denovo1 1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Bacilli; o__Bacillales; f__Staphylococcaceae; g__Staphylococcus; s__
denovo2 0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
denovo3 0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Lachnospiraceae
denovo4 0.0     1.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
...

In this example, de novo OTU picking was used, so the OTU IDs are generated by QIIME ("denovo0", "denovo1", etc). The "taxonomy" column contains taxonomic annotations assigned by assign_taxonomy.py using a reference database (Greengenes in this example).

The OTU IDs are determined by the OTU picking method you choose, as well as your reference sequence IDs. It looks like you're using open-reference OTU picking, which uses a combination of closed-reference and de novo OTU picking to produce the final set of OTUs. Thus, your OTU IDs will consist of reference sequence IDs ( "PGCHLAMYDO", "LAGOARIANALIKE", etc), as well as IDs generated by QIIME ("New.ReferenceOTU18", "New.ReferenceOTU19", etc). The taxonomic annotations assigned to each OTU come from your custom reference database's "ID-to-taxonomy mapping file" that is provided to assign_taxonomy.py. It looks like you're using BLAST as the taxonomy assignment method and aren't getting successful classification (indicated by "No blast hit").

For a description of the open-reference OTU picking method implemented in QIIME 1 (i.e. the subsampled approach), see this paper. Since you have a custom reference database, it might also be useful to look through some of the existing QIIME-compatible reference databases for examples of how the files should be formatted.

Does this clarify where the values in these columns are coming from?

Best,

Jai

Reply all

Reply to author

Forward