UCSC BLAT appears to handle split alignment correctly for shorter read sequences, but not for longer reads of the same region

30 views
Skip to first unread message

X L

unread,
Jul 9, 2025, 2:28:56 PMJul 9
to UCSC Genome Browser Public Support
Dear UCSC BLAT developers and users,

I have noticed an unexpected behavior with the BLAT tool. Specifically, BLAT performs split alignment for shorter reads but not for longer ones.

For example, using the hg38 reference genome:

Read1 (150nt):
TGCCAAGGTTTCAAGTGATCCTCCCGCCTCAGCCTGCCCAGGTGCTGAGATTACATGTATGAGCCACTGCACCTGGAAAGGAGCCAGAAATGTGAAGTGCTAGCTGAAGGATGAGCAGCAGCTAGCCAGGCAAAGGCGGGGAGACAACCC

Read2 (70nt):
GAGCCAGAAATGTGAAGTGCTAGCTGAAGGATGAGCAGCAGCTAGCCAGGCAAAGGCGGGGAGACAACCC

Read2 is simply the 3′ portion of Read1. However, BLAT performs split alignment for Read2 but not for Read1. A screenshot from IGV illustrating this behavior is attached below:
Screenshot 2025-07-09 at 11.41.26 AM.png

Is this expected behavior? I would greatly appreciate any insight or suggestions.

Best regards,
Xiao

Jairo Navarro Gonzalez

unread,
Jul 15, 2025, 5:50:53 PMJul 15
to X L, UCSC Genome Browser Public Support

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

BLAT is trying to find the best-scoring alignment between the target and query sequences. Sometimes, the best scoring alignment is to take a penalty and open a gap between the sequences. I took a look at the alignment for the short sequence that you shared, and the last block in the alignment consists of 15 nucleotides:

GCGGGGA GACAACCC
https://genome.ucsc.edu/cgi-bin/hgc?o=45552508&g=htcUserAli&i=../trash/hgSs/hgSs_genome_3629c2_6cb550.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_3629c2_6cb550.fa+YourSeq&c=chr1&l=45552508&r=45567996&db=hg38&hgsid=2725170376_N6lFJoXiehHh2MyWPdF0GZzOmpAz

When you search for the long sequence, you will see that BLAT decides to exclude the "split" alignment sequence from the alignment to get a better match:

https://genome.ucsc.edu/cgi-bin/hgc?o=45552433&g=htcUserAli&i=../trash/hgSs/hgSs_genome_3729f0_6cc7a0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_3729f0_6cc7a0.fa+YourSeq&c=chr1&l=45552433&r=45552564&db=hg38&hgsid=2725178444_N0qsssGAAJeK3Uo9wGZ99bZVLpof

cggg gagacaaccc

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/7b9e1661-7dd9-4b02-acc8-30d3b47250fen%40soe.ucsc.edu.
Reply all
Reply to author
Forward
0 new messages