BLAT

33 views
Skip to first unread message

Lachlan Musicman

unread,
Jun 3, 2015, 11:38:14 AM6/3/15
to genome...@soe.ucsc.edu
Hi.

1. Is it strictly necessary to have the gfServer run in memory?

1a. Does this mean that each time the server is rebooted for whatever reason, all of the gfServers need to be restarted? Where can I find the automation script for this?

2. Must I start a gfServer for each genome?

3. In the instructions found here https://genome.ucsc.edu/goldenPath/help/blatSpec.html#webBlatUsage

Under the section titled "Creating in-memory indexes with gfServer" there are two commands that contain magic numbers:
gfServer start bigRamMachine 17779 hg16.2bit &
gfServer -trans -mask start bigRamMachine 17778 hg16.2bit &
What do the 17779 and 17778 represent? Is it because we are running a server per genome, and these are port numbers?

4. What is the dif between a translated and an untranslated index, apart from the time taken to execute?

4a. If we have untranslated indices, does this mean I don't need a "Translated Server"?

5. If running webBlat locally, can bigRamMachine be replaced by hostname/localhost/127.0.0.1?

Cheers
L.




------
let's build quiet armies friends, let's march on their glass towers...let's build fallen cathedrals and make impractical plans

- GYBE

Lachlan Musicman

unread,
Jun 3, 2015, 11:38:53 AM6/3/15
to genome...@soe.ucsc.edu
Also, if we add a new genome to the browser, if we want it in BLAT, I presume the process is:

 - create the new gfServer using the 2bit create in step 3 of Building a new genome database http://genomewiki.ucsc.edu/index.php/Building_a_new_genome_database
 
 - edit /path/to/cgibin/webBlat.cfg

 - restart Apache

?

cheers
L.

------
let's build quiet armies friends, let's march on their glass towers...let's build fallen cathedrals and make impractical plans

- GYBE

Jonathan Casper

unread,
Jun 3, 2015, 8:02:20 PM6/3/15
to Lachlan Musicman, genome...@soe.ucsc.edu

Hello Lachlan,

Thank you for your questions about configuring BLAT servers. The responses to your questions are inline below.

1. Is it strictly necessary to have the gfServer run in memory?

gfServer runs in memory to provide a fast, responsive server for BLAT requests. There is no option to run it without loading the sequence into memory.

1a. Does this mean that each time the server is rebooted for whatever reason, all of the gfServers need to be restarted? Where can I find the automation script for this?

Yes, gfServer needs to be restarted when the server is rebooted. We do not have an automation script for this in the kent tree - our gfServer instances are run on dedicated servers maintained by our system administrators. They are rarely rebooted, but I believe they make use of the rc.d system (specifically, an entry in rc.local) to ensure everything is restored when that becomes necessary.

2. Must I start a gfServer for each genome?

Yes, a gfServer instance is specific to a genome assembly. We usually have two instances running per assembly: one for untranslated BLAT and one for translated.

3. In the instructions found here https://genome.ucsc.edu/goldenPath/help/blatSpec.html#webBlatUsage
Under the section titled "Creating in-memory indexes with gfServer" there are two commands that contain magic numbers: 
gfServer start bigRamMachine 17779 hg16.2bit &
gfServer -trans -mask start bigRamMachine 17778 hg16.2bit &
What do the 17779 and 17778 represent? Is it because we are running a server per genome, and these are port numbers?

Exactly right. 17779 and 17778 are the port numbers used to communicate with those servers. The port number should be different for each gfServer instance you have running.

4. What is the dif between a translated and an untranslated index, apart from the time taken to execute?
4a. If we have untranslated indices, does this mean I don't need a "Translated Server"?

An untranslated index is an index used to speed up DNA BLAT searches. A translated index is used to search using translated DNA sequence (e.g., amino acid sequences), and involves indexing the DNA database multiple times in different read frames. That's why it takes longer. An untranslated index cannot be used in place of a translated index. Technically you could just start an untranslated server without also starting a translated server, but that would mean that users would only be able to run BLAT using DNA sequence - no translated RNA, translated DNA, or protein.

5. If running webBlat locally, can bigRamMachine be replaced by hostname/localhost/127.0.0.1?

Yes, using localhost or 127.0.0.1 should work fine. Please note that the web BLAT interface provided by the UCSC Genome Browser is actually accomplished via the hgBlat CGI (one of the kent CGIs), and not webBlat. hgBlat communicates directly with gfServer instances using the information stored in the hgcentral.blatServers MySQL table.

Also, if we add a new genome to the browser, if we want it in BLAT, I presume the process is:

You should not need to restart Apache. Just start the new gfServer instance and edit your webBlat.cfg - webBlat loads the configuration file contents when it runs. Similarly, the only steps in adding another BLAT server to the hgBlat CGI are to start the server running and add the server information to the hgcentral.blatServers table in your MySQL server.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Lachlan Musicman

unread,
Jul 10, 2015, 11:55:25 AM7/10/15
to Jonathan Casper, genome...@soe.ucsc.edu
Thanks for your response Jonathan, that makes things clearer.

Given that they are all in memory, can I ask how much memory they take - are their rough measures?

If I were to deploy, what should I be putting aside for webBLAT, and have you deployed to independent servers because of the memory that is required?

Cheers
L.

------
let's build quiet armies friends, let's march on their glass towers...let's build fallen cathedrals and make impractical plans

- GYBE

Matthew Speir

unread,
Jul 17, 2015, 5:41:54 PM7/17/15
to Lachlan Musicman, genome...@soe.ucsc.edu
Hi Lachlan,

Thank for your question about BLAT and memory requirements. A typical untranslated BLAT server requires roughly 2-4 GB of memory.

We also run our BLAT servers with -stepSize=5 so that the In-Silico PCR (hgPCR) tool is sensitive enough to work with very short primers. Note that this setting will also increase the memory used by roughly 2x.

I hope this is helpful. If you have any further questions, please reply to genome...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages