Minimum Alignment Sequence Length To Assign Taxonomy

11 views
Skip to first unread message

Mykhaylo Usyk

unread,
Aug 22, 2017, 5:14:46 PM8/22/17
to Qiime 1 Forum
Hello,

I am running an ITS pipeline and have a question regarding how blast assigns taxonomy. What is the minimum number of nucleotides of the representative OTU sequence needs to align to a reference sequence before it is considered a hit in the blast database. I ask because after looking at one representative sequences, which was 367 bp, and manually aligning it to the blast hit I found that only 62 bp of the query were aligning. The percent identity was high at 94% with an e value of 1e-24, but the majority of the query is not used in the alignment. So at what length is the aligned fragment considered too short to produce a valid result?

Best,
Mike

Colin Brislawn

unread,
Aug 22, 2017, 5:35:38 PM8/22/17
to Qiime 1 Forum
Hello Mike,

This is a great question because it forces us to evaluate exactly how the blast taxonomy assigner in qiime is working. To do this, I took a look at the source code:

While I see blast hits be filtered by e-value and identify, I don't see mention of coverage. This could explain your 61 bp alignment.

If you use the SortMeRNA program for taxonomy search, you can specify coverage with --sortmerna_coverage:

I hope this helps,
Colin

Reply all
Reply to author
Forward
0 new messages