Discrepancies: Standalone BLAT -out=blast8 versus -out=pslx versus webblat

20 views
Skip to first unread message

Guinevere Q. Lee

unread,
Mar 5, 2021, 12:04:29 PM3/5/21
to UCSC Genome Browser Discussion List
Hi UCSC team

I installed the standalone BLAT and ran these two commands to look for the hg38 coordinates for the two sequences in the attached fasta file.   These are HIV integration site sequence data:  Half of each sequence is viral and half is human.

First sequence:
(1)
./blat hg38.2bit  TestCase_n2.fasta  -out=blast8  TestCase_n4_BLAT.txt
Returns chr13 Tstart 67486292 Tend 67485903 with missing strand information (see attached)
(2)
./blat hg38.2bit  TestCase_n2.fasta  -out=pslx  TestCase_n4_BLAT.txt
Returns chr13 Tstart 67485902 Tend 67486292 and the strand call is minus (see attached)
*note the discrepancies with flipped Tstart/Tend and 67485903 versus 67485902
(3)
I also put in the first sequence into web blat
Returns chr13 start 67486292 Tend 67485903 and the strand call is minus
(4) 
Then, I manually used twoBitToFa to retrieve chr13:67485902-67485907 from hg38.2bit, which outputed GAAAG, and its reverse complement CTTTC matched the beginning of the sequence output in the pslx file.

Second sequence:
(1) - (3) output respectively:
blast8:     chr13 Tstart 67485907 Tend 67484585
pslx:       chr13 - Tstart 67484584 Tend 67485907
webblat:  chr13 - Tstart 67484585 Tend 67485907

I am very confused.
Does anyone know what is going on?  Specifically:  Question1.  why am I seeing a shift of one?   Question2.  why are Tstart and Tend flipped in some methods?  Question3.  Finally, if I use -out=blast8, how do I get the strand plus/minus information?  

Thank you so much!

Best,
Guin




TestCase_Output_ver09_n2_blast8.txt
TestCase_n2.fasta
TestCase_Output_ver09_n2_pslx.txt

Luis Nassar

unread,
Mar 11, 2021, 4:03:48 PM3/11/21
to Guinevere Q. Lee, UCSC Genome Browser Discussion List

Hello, Guin.

Thank you for your interest in the Genome Browser and its tools.

As to your first question, why are you seeing a shift of one, that is due to different coordinate systems used by different output types. You are using two different outputs on blat (pslx and blast8), and then also web blat which uses the hyperlink output. These use the following coordinate systems:

  • blast8 - 1-start, fully-closed
  • hyperlink - 1-start, fully-closed
  • pslx - 0-start, half-open

We have a blog post which offers more details on these different coordinate systems (http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/).

Your second and third questions come down to the same reason. Blast format indicates negative strand alignments by having the start and end coordinates reversed so that the end coordinate is less than the start. In your example:

blast8: chr13 Tstart 67485907 Tend 67484585
pslx: chr13 - Tstart 67484584 Tend 67485907
webblat: chr13 - Tstart 67484585 Tend 67485907

If you subtract Tend - Tstart for pslx and webblat you will receive an item size of 1323 bp. However, blast8 yields -1322. This conveys the alignment is on the negative strand, and the bp difference is due to the 1-start, fully-closed coordinates.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKcPFxgJPym2E3-CAkc-7-NBSP1NZNy01kzXfsYpG7zFjg39cg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages