blat

19 views
Skip to first unread message

Jay Yuan

unread,
Jan 20, 2017, 10:18:31 AM1/20/17
to gen...@soe.ucsc.edu
Hello UCSC genome browser support team,



1. when you create index for BLAT, you said "By default, the index consists of all non-overlapping 11-mers except for those heavily involved in repeats" from https://genome.ucsc.edu/FAQ/FAQblat.html.
my question how to measure heavily involved in repeats? do you mean a k-mer has n loci in genome, it will be discard? if it is the case, how big is n by default?

2. in your paper, you said "Hits that are within the gap limit are bundled together into proto-clumps". how big is the gap limit by default?

3. in your paper, you also have "Hits within proto-clumps are then sorted along the database coordinate and put into real clumps if they are within the window limit on the database coordinate", how big the window limit?

thanks for your help

yuan

Jairo Navarro Gonzalez

unread,
Jan 27, 2017, 5:13:22 PM1/27/17
to Jay Yuan, gen...@soe.ucsc.edu

Dear Yuan,

Thank you for using the UCSC Genome Browser and your questions about BLAT. 
You can find some of the information that you are looking for in the BLAT specification documentation.

The -repMatch=N option sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. The default for this option is 1024 since the default tile size for BLAT is 11. BLAT also has tile over-use limits that affect the results, which is documented in the -ooc=N.ooc option. Even if you set repmatch to an enormously high value, it might still filter out tiles which are getting a large number of hits since it may have internal heuristics that limit maximum hits for a tile.

Usually, two nearby hits on the same "diagonal" constitute a clump worthy of further investigation by using a slow but effective dynamic programming algorithm to try to extend the hit into full alignments. The default gap limit size in a clump is controlled by the -maxGap option. The maximum gap between clumps for a pair of hits is small (0,1,2) and the option by default is set to 2 for nucleotide sequences and 1 for protein sequences. With this option, BLAT can tolerate small indels but large gaps, such as between exons, are handled in much later step that chains blocks into full gene alignments.

The formation of proto-clumps is a heuristic inside of BLAT and cannot be altered. If you would like to know the values of these variables, you will have to read the BLAT source code. Further evidence of internal heuristics controlling the formation of proto-clumps is that there is no BLAT command-line option to control or change them.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages