BLAT on custom assembly hub

39 views
Skip to first unread message

张翼

unread,
Jan 5, 2015, 1:16:47 PM1/5/15
to gen...@soe.ucsc.edu
 To whom it might concerns,
 
We put a new genome assembly on GBiB as a custom assembly hub and it worked nicely. However we would also like to make the web-based BLAT service available to this custom hub. Could you let me know how can I set it up? Thank you very much in ahead.
 
PS. The GBiB is very easy to set up yet it came with a few problems such as a complete lack of capability to compile the Kent utilities (such that we have to compile/run some of the commands on a different machine) which is probably due to a gcc or library problem, and a strange problem with apt-get (it never work in our lab or at home so we could not upgrade the libraries, nor could we even use git). When we first started trying to install the new assembly we tried the old method described with hgsql commands etc, and the problem that these src never compile on the virtual machine bugged us a bit though we circumvented by directly using the mysql command line in the end. But there are many more programs, for example the hgGcContent, never work on the VM.
 
Best,
Yi
 


-- 
No fate but what we make.

Zhang, Yi
Room 328, Jin Guang Life Sciences Buildling,
College of Life Sciences, Peking University,
5 Yiheyuan Rd., Beijing 100871
China, People's Republic of
Tel: 0086-10-80726688-8368(NIBS)/0086-10-62751864(PKU) 
http://gplus.to/synapse
 

Brian Lee

unread,
Jan 8, 2015, 6:13:16 PM1/8/15
to 张翼, gen...@soe.ucsc.edu

Dear Yi,

Thank you for using the UCSC Genome Browser and the new GBiB and your message about adding BLAT on a custom assembly hub. Next week we will be releasing a new feature that will enable BLAT support of assembly hubs.

To create an assembly hub on the browser, and the GBiB, you do not have to use hgcentral, complex development tools or mysql commands. Rather follow these steps to creates a few text files (hub.txt, genomes.txt, trackDb.txt) and other associated assembly files in an accessible directory: http://genomewiki.ucsc.edu/index.php/Assembly_Hubs

Below I outline copying a small assembly hub to your local disk and then trying loading it on your GBiB to have a working example. On your computer find a place you do not mind copying over the below files, about 33M total. In that directory you created, run the following wget command to recursively grab the directory structure and files needed:

wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

Once you have copied this assembly hub, follow the GBiB instructions on sharing the folder where it is located. First power off the machine, then select the Settings option for the machine in VirtualBox, and click the "Shared Folders" tab and the plus folder icon, as described here with pictures: http://genome.ucsc.edu/goldenPath/help/gbib#YourTracks

Then you can restart your GBiB and navigate to the folders location to see your shared files,http://127.0.0.1:1234/folders/, and then look for where you copied the hub.txt. In my example, this assembly hub.txt is in my shared Google Drive folder: http://127.0.0.1:1234/folders/sf_Google_Drive/trackHubAssembly/hubExamples/hubAssembly/plantAraTha1/hub.txt

Paste the URL to the shared folder location of your copied version of this hub.txt into the My Hubs tab on the Hub page, http://127.0.0.1:1234/cgi-bin/hgHubConnect, and it should load fine.

Now that you have an assembly hub loaded and working fine on your GBiB, you can explore the details regarding its structure. This small plant assembly hub is a slice of a larger assembly hub you can also explore: http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/

At the end of next week we will add BLAT capabilities to assembly hubs. When the changes are public you can acquire these changes by running gbibUpdate on the command line of your GBiB.

I will now go over how to activate BLAT on the above Assembly Hub example once those updates have been acquired after next week.

First navigate to the location of your the copied genomes.txt and remove the comment "#" from the two lines mentioning BLAT:
blat localhost 17779
transBlat localhost 17777

Now that the genomes.txt is updated for this assembly hub we only need to start BLAT servers on the GBiB with gfServer. To do this navigate on your GBiB to where the 2bit is located for this assembly hub (araTha1.2bit). In my example:

cd /folders/sf_Google_Drive/trackHubAssembly/hubExamples/hubAssembly/plantAraTha1/araTha1/

I suggest you take advantage of the ability to ssh into your GBiB to run the gfServer commands. When your GBiB is running use the following command, ssh browser@localhost -p 1235, to access your GBiB, the password is "browser". Read more about accessing GBiB with ssh here: http://genome.ucsc.edu/goldenpath/help/gbib.html#YourTracks

From this location on your GBiB run the following gfServer commands that will start two BLAT servers in the background to enable amino acid and DNA sequence blatting:

gfServer start localhost 17777 -trans -mask araTha1.2bit &
gfServer start localhost 17779 -stepSize=5 araTha1.2bit &

You can use ps to see these operations going, and kill -9 #### to end them. Note that the 17777 and "-trans" option for amino acid blatting matches the numbers added to genomes.txt transBlat localhost 17777. Read more about gfServer and BLAT configuration here:
https://genome.ucsc.edu/FAQ/FAQblat.html#blat5
http://genomewiki.ucsc.edu/index.php/Running_your_own_gfServer

With these gfServer commands running in the background on the GBiB you can now load the Assembly Hub and run BLAT operations.

Regarding the utilities on the GBiB, if you run ls $HOME/bin you should see the entire list of them available. One can also run a gbibAddTools command, but it should not be necessary. Again I highly recommend using ssh, ssh browser@localhost -p 1235, to enter your GBiB from your computer's terminal program. Also, while not necessarily recommended, here are some internal notes about converting a GBiB into a machine for development:http://genomewiki.soe.ucsc.edu/genecats/index.php/Gbib_development#Converting_your_gbib_into_a_machine_for_development

Thank you again for trying out the GBiB and using the assembly hub features. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

P.S. Regarding shared folders, if desired you can build them entirely in the GBiB, what follows is a condensed review of the above with the relative locations that would result. I suggest first using the ssh browser@localhost -p 1235 to access the running GBiB from your computer's terminal. Then going to the shared folders cd /folders one could use sudo to wget the assembly hub sudo wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

You could then load this hub by loading this URL, selecting it under "group" with "Plant araTha1" http://127.0.0.1:1234/cgi-bin/hgGateway?hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt. Then when the blat update is out and obtained with gbibUpdate you could navigate to the genomes.txt file, cd /folders/hubExamples/hubAssembly/plantAraTha1/ and the commented blat lines could be edited sudo vi genomes.txt and then you would change directories to the 2bit files cd /folders/hubExamples/hubAssembly/plantAraTha1/araTha1 and run the two gfserver commands to start the BLAT servers:
gfServer start localhost 17777 -trans -mask araTha1.2bit &
gfServer start localhost 17779 -stepSize=5 araTha1.2bit &



--


Brian Lee

unread,
Jan 14, 2015, 2:10:56 PM1/14/15
to 张翼, gen...@soe.ucsc.edu

Dear Yi,

The new feature enabling blat on assembly hubs has been released to the browser and can be obtained on the GBiB. In case you, or future users reviewing this archived mailing list, would like to try this feature on a GBiB, here are a review of some steps to take.

1. First open your operational GBiB, here is the user guide: http://genome.ucsc.edu/goldenPath/help/gbib.html

2. With your GBiB operational you use your computer's terminal program to ssh into your GBiB: ssh browser@localhost -p 1235, password "browser." In case you may have an older GBiB you can run gbibUpdate to synchronize your GBiB.

3. To test out the blat feature on assembly hubs you can grab this example assembly hub. Go to the GBiB's folders directory cd /folders. Then use sudo to wget this assembly hub sudo wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

4. On your terminal navigate to the genomes.txt file of this assembly hub, cd /folders/hubExamples/hubAssembly/plantAraTha1/ and edit the currently commented-out blat lines with sudo vi genomes.txt. Use "x" when the cursor is over # at the start of the line to remove it and :w! to save the changes and :q to quit.


blat localhost 17779
transBlat localhost 17777

5. With these blat lines in place in the genomes.txt of the assembly hub you can change directories to the 2bit files cd /folders/hubExamples/hubAssembly/plantAraTha1/araTha1 and run the two gfserver commands to start the blat servers. Use ps to see the processes running.


gfServer start localhost 17777 -trans -mask araTha1.2bit &
gfServer start localhost 17779 -stepSize=5 araTha1.2bit &

6. This assembly hub can no be loaded on your GBiB by clicking this URL and selecting it under the "group" category where "Plant araTha1" displays: http://127.0.0.1:1234/cgi-bin/hgGateway?hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt.

7. Now on the blat page, http://127.0.0.1:1234/cgi-bin/hgBlat, you can select the Arabidopsis thaliana assembly and blat plant amino acid sequences, like IYQTRENKYIIGEIQITESERDRRRSSLPGNH or DNA sequences, like TAAGTAAAAAATAATATGATTAAGACTAATAAATCTTAATAGTTAATACT.

Thank you again for trying out the GBiB. If you have any further questions, please reply togen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group

Reply all
Reply to author
Forward
0 new messages