Understanding Blat Score Calculation

16 views
Skip to first unread message

dennis

unread,
Mar 30, 2015, 12:01:27 PM3/30/15
to gen...@soe.ucsc.edu
I have seen this question discussed but all the answers I have found
just quote the blat web page. If I have a 120 nt query and get a 28 nt
hit with 24 nt being 100% match and a 3 nt gap. Blat reports that score
as 24 but 28*2-64-x is a negative number. This is based on
2*match-mismatch-gap_penalty that is described on the web page. Can one
of you folks explain where I am making a mistake?

Steve Heitner

unread,
Mar 30, 2015, 3:33:16 PM3/30/15
to dennis, gen...@soe.ucsc.edu
Hello, Dennis.

The calculation you should be using is referenced in http://genome.ucsc.edu/FAQ/FAQblat.html#blat4. The relevant portion of the script is:

my $pslScore = $sizeMul * ($matches + ( $repMatches >> 1) ) - $sizeMul * $misMatches - $qNumInsert - $tNumInsert;

The value of sizeMul is either 3 or 1 depending on whether or not your query is a protein sequence or not. Since we're not dealing with a protein sequence, the value of sizeMul is 1, so the formula is essentially:

pslScore = #matches - #misMatches - #qInserts - #tInserts

Based on what you described, it sounds like the score is roughly what it should be.

If there is still confusion, could you let us know whether you're using gfServer with hgBlat, standalone blat or something else? Also, please provide your query sequence and the name of the assembly you're querying so we can attempt to replicate your query.

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages