question about -minIdentity for command-line blat

128 views
Skip to first unread message

liuxq

unread,
Dec 4, 2012, 4:26:29 AM12/4/12
to gen...@soe.ucsc.edu
hi all,
I am Xiaoqiao Liu, a blat user in Beijing, China.

I use command-line version of blat with default parameter except -q=rna.

I am paying attention to the identity cut-off of blat. According to the manual of blat, the default value for -minIdentity for nucleotide should be 90.

Yet, when I write a perl script to extract identity % from psl output from blat according to blat faq, http://genome.ucsc.edu/FAQ/FAQblat.html, there are many items in psl output with identity less than 90%.

How to explain this?

kind regards,
xiaoqiao

Steve Heitner

unread,
Dec 5, 2012, 12:11:20 PM12/5/12
to liuxq, gen...@soe.ucsc.edu
Hello, Xiaoqiao.

Could you please provide an example of the command line you are using
including the database and sequence you are querying?

Please contact us again at gen...@soe.ucsc.edu if you have any further
questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group
--




liuxq

unread,
Dec 6, 2012, 2:23:05 AM12/6/12
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Hi Steve,

I use blat almost in the default setting, below is my command for blat running

~/tools/blat/blat ~/tools/blat/hg19_chr_list input.fasta -q=rna output_blat.psl

The database is the hg19 reference genome including un and alternative loci. I put one query sequence and the output psl item below. For this psl item, I get the %identity 88.3495145631068 which is smaller than the default -minIdentity 90%.

>comp136_c0_seq1
GTTCAGCCACTGCGTTGATCCTCCATGGGGGCCTGCCATACAGTGCTCTGGCGAGGCGTC
CCAGTGGGGCAAATGCCTAACCAGGAGCGCCCTCAGGATCCGTGTTGCTCGGGCTGGTTG
GAGTCCCCTGCAGGGATGTTCCACAGGGCAGGTTTAAGCCGCCTAAGGAGCTGCCTTGAC
CATCCGCCATTCACCTCGCTTCCCAGTCAGGGAA

190 16 0 0 2 8 1 2 - comp136_c0_seq1 214 0 214 chr2 243199373 109140821 109141029 3 107,36,63, 0,112,151, 109140821,109140928,109140966,


Thanks, steve. Look forward to your response.

Xiaoqiao


-----邮件原件-----
发件人: Steve Heitner [mailto:st...@soe.ucsc.edu]
发送时间: 2012年12月6日 1:11
收件人: 'liuxq '; gen...@soe.ucsc.edu
主题: RE: [genome] question about -minIdentity for command-line blat

Steve Heitner

unread,
Dec 6, 2012, 3:14:52 PM12/6/12
to liuxq, gen...@soe.ucsc.edu
Hello, Xiaoqiao.

If one includes query-side gaps or unaligned parts in the percent identity, you would have 190/214 = 88.8%.

But the gaps are not typically counted as they appear between aligning blocks.

[hgwdev:~> calc '190/(214-8)'
190/(214-8) = 0.922330

Sometimes gaps are huge and, if included, they would completely distort the percent identity score. On the other hand, you can have poor coverage while the small part that actually does align has 100% identity.

Typically, people use pslCDnaFilter as it has many useful filtering options that are much more flexible than minScore and minIdentity that command line blat offers.
Reply all
Reply to author
Forward
0 new messages