BLAT gfServer configuration

258 views
Skip to first unread message

Inbal Paz

unread,
Jan 8, 2014, 6:57:03 AM1/8/14
to gen...@soe.ucsc.edu

Hi,

I'm using BLAT locally with gfServer.

I've encountered several cases, mostly in short sequences, where the local BLAT doesn't find a match for the query sequence but the online web BLAT does.

I'm interested in a 100% identity only, in order to map the query sequence to the genome.

For example the following sequence:

>hsa-mir-17 MI0000071 Homo sapiens miR-17 stem-loop

GUCAGAAUAAUGUCAAAGUGCUUACAGUGCAGGUAGUGAUAUGUGCAUCUACUGCAGUGAAGGCACUUGUAGCAUUAUGGUGAC

here is the correct target, obtained by the web BLAT:

hsa-mir-17        84     1    84    84 100.0%    13   +   92002859  92002942     84

I tried to run gfServer both in stepSize=11 (default) and stepSize=5 (I kept the tileSize=11) and in both configurations I didn't get the correct match.

For stepSize=11 I got the following results:

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T               T       T       T       block   blockSizes      qStarts  tStarts

        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count

---------------------------------------------------------------------------------------------------------------------------------------------------------------

20      0       0       0       0       0       1       2       -       hsa-mir-17      84      51      71      chr2    243199373       10467215        10467237        2       9,11,   13,22,  10467215,10467226,

For stepSize=5 I got the following results:

match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q               Q       Q       Q       T               T       T       T       block   blockSizes      qStarts  tStarts

        match   match           count   bases   count   bases           name            size    start   end     name            size    start   end     count

---------------------------------------------------------------------------------------------------------------------------------------------------------------

16      0       0       0       0       0       0       0       -       hsa-mir-17      84      3       19      chr10   135534747       92081785        92081801        1       16,     65,     92081785,

18      0       0       0       0       0       0       0       -       hsa-mir-17      84      51      69      chr12   133851895       211640  211658  1       18,     15,     211640,

16      0       0       0       0       0       0       0       -       hsa-mir-17      84      40      56      chr15   102531392       55368705        55368721        1       16,     28,     55368705,

18      0       0       0       0       0       0       0       -       hsa-mir-17      84      0       18      chr5    180915260       170575679       170575697       1       18,     66,     170575679,

I would like to know what may cause the differences between the versions and how to configure my gfServer in order to get the same results as in the web-based BLAT.

Best,

Inbal

------------

Inbal Paz

Bioinformatician and web developer

Yael Mandel Gutfreund's lab

Technion Israel Institute of Technology

Tel: +972-4-8293701

Brian Lee

unread,
Jan 8, 2014, 2:28:07 PM1/8/14
to Inbal Paz, gen...@soe.ucsc.edu
Dear Inbal,

Thank you for using the UCSC Genome Browser and your question about Blat gfServer configuration.

Please see this FAQ about replicating web-based Blat parameters in the command-line version: http://genome.ucsc.edu/FAQ/FAQblat.html#blat5

These previously answered mailing list questions, along with searches for similar questions in our archives, may also be helpful:


Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

After reviewing the above resources, please know too that our engineer adds if you want to get as close as you can to hgPcr/gfServer using gfClient/gfServer, then you will need to use gfServer with tileSize=11 stepSize=5, and you will need to use gfClient with -minScore=20. You may also wish to set minIdentity, but it may be best to look at the output and choose what to ignore. Once you worked out the kinds of filtering that is appropriate to your experiment or analysis, you could then use pslCDnaFilter or pslReps to filter the psl output.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--
 

Reply all
Reply to author
Forward
0 new messages