Dear Yuan,
Thank you for using the UCSC Genome Browser and your questions about BLAT.
You can find some of the information that you are looking for in the BLAT specification documentation.
The -repMatch=N option sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. The default for this option is 1024 since the default tile size for BLAT is 11. BLAT also has tile over-use limits that affect the results, which is documented in the -ooc=N.ooc option. Even if you set repmatch to an enormously high value, it might still filter out tiles which are getting a large number of hits since it may have internal heuristics that limit maximum hits for a tile.
Usually, two nearby hits on the same "diagonal" constitute a clump worthy of further investigation by using a slow but effective dynamic programming algorithm to try to extend the hit into full alignments. The default gap limit size in a clump is controlled by the -maxGap option. The maximum gap between clumps for a pair of hits is small (0,1,2) and the option by default is set to 2 for nucleotide sequences and 1 for protein sequences. With this option, BLAT can tolerate small indels but large gaps, such as between exons, are handled in much later step that chains blocks into full gene alignments.
The formation of proto-clumps is a heuristic inside of BLAT and cannot be altered. If you would like to know the values of these variables, you will have to read the BLAT source code. Further evidence of internal heuristics controlling the formation of proto-clumps is that there is no BLAT command-line option to control or change them.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.