gfServer start bigRamMachine 17779 hg16.2bit & gfServer -trans -mask start bigRamMachine 17778 hg16.2bit &
Hello Lachlan,
Thank you for your questions about configuring BLAT servers. The responses to your questions are inline below.
1. Is it strictly necessary to have the gfServer run in memory?
gfServer runs in memory to provide a fast, responsive server for BLAT requests. There is no option to run it without loading the sequence into memory.
1a. Does this mean that each time the server is rebooted for whatever reason, all of the gfServers need to be restarted? Where can I find the automation script for this?
Yes, gfServer needs to be restarted when the server is rebooted. We do not have an automation script for this in the kent tree - our gfServer instances are run on dedicated servers maintained by our system administrators. They are rarely rebooted, but I believe they make use of the rc.d system (specifically, an entry in rc.local) to ensure everything is restored when that becomes necessary.
2. Must I start a gfServer for each genome?
Yes, a gfServer instance is specific to a genome assembly. We usually have two instances running per assembly: one for untranslated BLAT and one for translated.
3. In the instructions found here https://genome.ucsc.edu/goldenPath/help/blatSpec.html#webBlatUsage
Under the section titled "Creating in-memory indexes with gfServer" there are two commands that contain magic numbers:
gfServer start bigRamMachine 17779 hg16.2bit &
gfServer -trans -mask start bigRamMachine 17778 hg16.2bit &
What do the 17779 and 17778 represent? Is it because we are running a server per genome, and these are port numbers?
Exactly right. 17779 and 17778 are the port numbers used to communicate with those servers. The port number should be different for each gfServer instance you have running.
4. What is the dif between a translated and an untranslated index, apart from the time taken to execute?
4a. If we have untranslated indices, does this mean I don't need a "Translated Server"?
An untranslated index is an index used to speed up DNA BLAT searches. A translated index is used to search using translated DNA sequence (e.g., amino acid sequences), and involves indexing the DNA database multiple times in different read frames. That's why it takes longer. An untranslated index cannot be used in place of a translated index. Technically you could just start an untranslated server without also starting a translated server, but that would mean that users would only be able to run BLAT using DNA sequence - no translated RNA, translated DNA, or protein.
5. If running webBlat locally, can bigRamMachine be replaced by hostname/localhost/127.0.0.1?
Yes, using localhost or 127.0.0.1 should work fine. Please note that the web BLAT interface provided by the UCSC Genome Browser is actually accomplished via the hgBlat CGI (one of the kent CGIs), and not webBlat. hgBlat communicates directly with gfServer instances using the information stored in the hgcentral.blatServers MySQL table.
Also, if we add a new genome to the browser, if we want it in BLAT, I presume the process is:
- create the new gfServer using the 2bit create in step 3 of Building a new genome database http://genomewiki.ucsc.edu/index.php/Building_a_new_genome_database
- edit /path/to/cgibin/webBlat.cfg
- restart Apache
You should not need to restart Apache. Just start the new gfServer instance and edit your webBlat.cfg - webBlat loads the configuration file contents when it runs. Similarly, the only steps in adding another BLAT server to the hgBlat CGI are to start the server running and add the server information to the hgcentral.blatServers table in your MySQL server.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--
--