Different results for BLAT search depending on output type

161 views
Skip to first unread message

Alex Reynolds

unread,
Nov 14, 2017, 1:42:07 PM11/14/17
to gen...@soe.ucsc.edu
Hello UCSC Genome Browser team,

I have a question about running a web BLAT search and getting different results, based on the selection of output type.

We start with the following query sequence:

Inline image 2

As highlighted in the red box, "PSL" is the selected output type.

We submit the search. Here is a snapshot of the result:

Inline image 1

There is a red box around the Match column for the PSL-formatted result, to highlight the area of interest.

We run the query again on the same sequence, but select "Hyperlink" as the output type:

Inline image 3

Here is the result from that search:

Inline image 4

The target sequences are in the same order between PSL and Hyperlink output tables, however, the Match column scores are different.

What is the difference between Hyperlink and PSL output types, which would change the Match results?

Regards,
Alex

Brian Lee

unread,
Nov 14, 2017, 4:04:21 PM11/14/17
to Alex Reynolds, gen...@soe.ucsc.edu
Dear Alex,

Thank you for using the UCSC Genome Browser and your question about the differences between the PSL output and the hyperlink output for BLAT tracks, and thank you for all your detailed examples explaining your situation.

When the view is displayed as PSL the matches column does not represent the same information as when the track is displayed with hyperlinks, where the score column is added from additional calculations. Therefore these two columns you have highlighted with different numbers are not identical.

There is a BLAT FAQ page that might be of interest with an entry for those who may be using the command-line version who wish to replicate this score column: http://genome.ucsc.edu/FAQ/FAQblat.html#blat4

That FAQ shares how there is a plsScore step that happens to convert the raw PSL output into the view you are seeing when you click the (default) hyperlink option, here is a link to plsScore.pl if it is of interest (http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/utils/pslScore/pslScore.pl). When requesting PSL, this score isn't included, as seen in the format, here is a link to see the definition of matches in PSL: http://genome.ucsc.edu/FAQ/FAQformat.html#format2

You may be interested to know too that we recently allowed BLAT results to be outputted as custom tracks by clicking the "Build a custom track with these results" when you have the "hyperlink" results. This allows you to then save separate BLAT results into a session with unique names (changing "blat YourSeq" to a unique title). You can also then go to the Table Browser and select the "group: Custom Tracks" and the track you have named, and change "output format:" to "all fields" and the PSL will be output in the bigPsl pairwise alignment format, which you can read more about here and on an associated page. 

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKHG4HoOzHocNw9CAXOebX1nETUDzQbpUmoT9L0gBeK7zSaQ0w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Alex Reynolds

unread,
Nov 14, 2017, 5:33:07 PM11/14/17
to gen...@soe.ucsc.edu
Hi Brian,

Thanks for the quick response, and for a link to pslScore.pl and the links to the FAQ. 

It sounds like there are two types of scores produced: a "web BLAT" score and a "PSL" score. Whatever the score is from the PSL dataset, it is adjusted to generate the web BLAT score. Is that a correct interpretation?

I had already read the definition of matches in the PSL description, but I guess my larger question what the difference is between the two scores; why they would be calculated differently (or why is one an adjustment of the other)? Does this change the interpretation of the matches score from the "web BLAT" dataset?

Thanks again for your help.

Regards,
Alex

Brian Lee

unread,
Nov 14, 2017, 5:47:36 PM11/14/17
to Alex Reynolds, gen...@soe.ucsc.edu

Hi Alex,

Thanks for the follow-up. If you look closely at the PSL format (http://genome.ucsc.edu/FAQ/FAQformat.html#format2), you will see that there is no score field.

The matches column, that you might be thinking of as a score, is defined as "Number of bases that match that aren't repeats."

There is a further calculation using the PSL input to create score information only seen on the "web BLAT" page. This score is a result of a second step to generate that information, in a process like the shared pslScore script.

If you look at that script, toward the bottom there is an indication that beyond just the matches column, it also incorporates the repMatches, misMatches, qNumInsert, and tNumInsert in the PSL data.

The way to think about it might be that the PSL is the raw information, and then when the browser goes to display results as hyperlinks (not raw PSL data), a score is calculated on the data in an attempt to process that raw information and sort it so the best results are near the top. If matches = best results, then these would sort identically, but it is likely sometimes there are other pieces of information that need to be considered. You may wish to look in our archives to see some discussion about this topic: https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/matches$20PSL$20score%7Csort:date

For example, here is an excerpt from our archives:

- Can I just take the alignment that has the highest number in the 
"match"-column and take this entry as the "best" alignment?

This is a valid approach. However, some subtleties will be missed. For 
instance, a perfect match that has no gaps on either the target or the 
query side would be treated the same as a match where each base matched 
perfectly, but the matches were interrupted by non-matching sequence.

I hope this was helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian


Alex Reynolds

unread,
Nov 14, 2017, 6:34:08 PM11/14/17
to gen...@soe.ucsc.edu
Many thanks! This was supremely useful and clears up a great deal about the difference.

Regards,
Alex
Reply all
Reply to author
Forward
0 new messages