New genome assembly genome in GBiB

92 views
Skip to first unread message

David da Silva Pires

unread,
Oct 30, 2014, 5:17:07 PM10/30/14
to gen...@soe.ucsc.edu
Hi! Thank you very much by the release of Genome Browser in a Box. It has saved me a lot of work!

I read all the documentation at
http://genome.ucsc.edu/goldenPath/help/gbib.html
but is not clear to me if it is possible to add a new genome assembly to GBiB.

I would like to add the genome of Schistosoma mansoni, along with many track hubs and custom annotation tracks. Although I saw that it is possible to add the latter two, I can't find any information about including a new genome. Is that possible? If it is, is the same way that would be followed to include a new genome assembly at a UCSC Genome Browser mirror?

Besides that, I think that it is very hard to find information about how to include a new genome assembly. Do you recommend a specific URL? Sorry if the question is so basic, but I am a newbie at UCSC Genome Browser.

Thanks in advance.

--
David da Silva Pires

Matthew Speir

unread,
Oct 31, 2014, 12:44:12 PM10/31/14
to David da Silva Pires, gen...@soe.ucsc.edu
Hi David,

Thank you for your question about adding a new assembly to the Genome Browser in a Box (GBiB). You can add a new assembly to your GBiB by creating an assembly hub. Our assembly hub feature is an extension of the track hub feature that allows you to specify your own sequence file and related annotations. For more information on crating your own assembly hub, please see the following help pages:
If you create the assembly hub locally on your computer, you can add it to GBiB by sharing the folder with GBiB according to the "Loading local big data tracks and track hubs" section: http://genome.ucsc.edu/goldenPath/help/gbib.html#LocalTracks. You can also use GBiB to create the 2bit and big data files for your assembly hub by following the instructions for downloading and using the Genome Browser utilities through GBiB in the "Data and track conversion tools" paragraph of that section.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


David da Silva Pires

unread,
Nov 7, 2014, 11:53:11 AM11/7/14
to gen...@soe.ucsc.edu
Hi, Matthew.

Thank you very much for indicating the URLs.

By reading them, I created my Schistosoma mansoni assembly hub, as shown below:

hubs/
└── helminths
    ├── geneNetwork.html
    ├── genomes.txt
    ├── hub.txt
    └── schMan1
        ├── description.html
        ├── groups.txt
        ├── Schistosoma_mansoni.jpg
        ├── Schistosoma_mansoni_v5.2.fa
        ├── schMan1.2bit
        ├── schMan1.chrom.sizes
        ├── schMan1.tab
        └── trackDb.txt

Unfortunately, it has not been easy to found how to load these files at the genome browser, in order to be able to select the new genome and visualize the data. It seems that, besides these files, it is also necessary to create and fill some tables at the MySQL database. Is that true?

Following this suggestion, I started to execute the commands on this page:
http://genomewiki.ucsc.edu/index.php/Building_a_new_genome_database

But on step 2 I need to use a command called hgFakeAgp, which is not present at GBiB, not even after executing gbibAddTools. Can you tell me if there is a repository where I could download this binary for the Ubuntu virtual machine where the GBiB is installed? Or the tutorial on this page is too old and there is another way to make genome browser list my new assembly between the menu options?

Thank you, again.

Best regards.

--
David da Silva Pires

David da Silva Pires

unread,
Nov 10, 2014, 4:36:00 PM11/10/14
to gen...@soe.ucsc.edu
Hi.

In order to put the Schistosoma mansoni genome as an assembly hub at GBiB, I'm not following the wiki page described on the previous message any more. Now, I'm trying to run the commands listed at the following slides:

https://banana-slug.soe.ucsc.edu/_media/lecture_notes:genomebrowsersetup.pdf

This seems to be an approach closely related to the databases existent in the genome browser. But I'm stuck in the following commands:

======================================================================
browser@browserbox:~/local/src/jksrc/kent/src/hg/makeDb/trackDb$ hgTrackDb -strict schMan1 schMan1 trackDb ~/local/src/jksrc/kent/src/hg/lib/trackDb.sql .
SQL_CONNECT 272 localhost hgcentral localhost root
SQL_TIME 272 localhost hgcentral 0.037s
SQL_QUERY 272 localhost hgcentral NOSQLINJ SELECT 1 FROM tableList LIMIT 0
SQL_TIME 272 localhost hgcentral 0.000s
SQL_TABLE_NOT_EXISTS 272 localhost hgcentral tableList
SQL_NOT_FOUND_TABLE_CACHE 272 localhost hgcentral tableList
SQL_QUERY 272 localhost hgcentral NOSQLINJ SELECT 1 FROM userDb LIMIT 0
SQL_TIME 272 localhost hgcentral 0.000s
SQL_FETCH 272 localhost hgcentral 0.000s
SQL_NOT_FOUND_TABLE_CACHE 272 localhost hgcentral tableList
SQL_QUERY 272 localhost hgcentral NOSQLINJ SELECT 1 FROM sessionDb LIMIT 0
SQL_TIME 272 localhost hgcentral 0.000s
SQL_FETCH 272 localhost hgcentral 0.000s
SQL_QUERY 272 localhost hgcentral NOSQLINJ select name from dbDb where name = 'schMan1'
SQL_TIME 272 localhost hgcentral 0.000s
SQL_FETCH 272 localhost hgcentral 0.000s
SQL_CONNECT 273 localhost schMan1 localhost root
SQL_TIME 273 localhost schMan1 0.000s
SQL_QUERY 273 localhost schMan1 NOSQLINJ SELECT 1 FROM tableList LIMIT 0
SQL_TIME 273 localhost schMan1 0.000s
SQL_TABLE_NOT_EXISTS 273 localhost schMan1 tableList
SQL_FAILOVER_NO_TABLE_CACHE_FOR_DB 273 localhost schMan1 schMan1
SQL_QUERY 273 localhost schMan1 NOSQLINJ SHOW TABLES
SQL_TIME 273 localhost schMan1 0.000s
SQL_FETCH 273 localhost schMan1 0.000s
SQL_QUERY 0 genome-mysql.cse.ucsc.edu schMan1 NOSQLINJ SHOW TABLES
Couldn't connect to database schMan1 on genome-mysql.cse.ucsc.edu as genomep.
Unknown database 'schMan1'
SQL_TOTAL_TIME 0.612s
SQL_TOTAL_QUERIES 7
SQL_DISCONNECT 272 hgcentral
SQL_TIME 272 localhost hgcentral 0.000s
SQL_DISCONNECT 273 schMan1
SQL_TIME 273 localhost schMan1 0.001s
SQL_DISCONNECT 13428376 schMan1
SQL_TIME 13428376 genome-mysql.cse.ucsc.edu schMan1 0.000s
======================================================================



======================================================================
browser@browserbox:~/local/src/jksrc/kent/src/hg/makeDb/trackDb$ hgFindSpec -strict schMan1 schMan1 hgFindSpec ~/local/src/jksrc/kent/src/hg/lib/hgFindSpec.sql .
SQL_CONNECT 278 localhost schMan1 localhost root
SQL_TIME 278 localhost schMan1 0.008s
SQL_QUERY 278 localhost schMan1 NOSQLINJ SELECT 1 FROM tableList LIMIT 0
SQL_TIME 278 localhost schMan1 0.000s
SQL_TABLE_NOT_EXISTS 278 localhost schMan1 tableList
SQL_FAILOVER_NO_TABLE_CACHE_FOR_DB 278 localhost schMan1 schMan1
SQL_QUERY 278 localhost schMan1 NOSQLINJ SELECT 1 FROM trackDb LIMIT 0
SQL_FAILOVER 278 localhost schMan1 db -> slow-db | SELECT 1 FROM trackDb LIMIT 0
Couldn't connect to database schMan1 on genome-mysql.cse.ucsc.edu as genomep.
Unknown database 'schMan1'
SQL_TOTAL_TIME 0.783s
SQL_TOTAL_QUERIES 2
SQL_DISCONNECT 278 schMan1
hgFindSpec: jksql.c:395: monitorEnter: Assertion `monitorEnterTime == 0' failed.
Aborted
======================================================================



Note that both commands seems to fail because there is a search for schMan1 database on UCSC URL genome-mysql.cse.ucsc.edu. How can I tell to these commands that the Schistosoma mansoni database (schMan1) is on the MySQL running on the localhost (GBiB virtual disk) instead of in an on-line database?

Another question: please, if there is an easier way to configure a new assembly hub in UCSC Genome Browser, could you tell me?

Thanks in advance.

Brian Lee

unread,
Nov 10, 2014, 5:45:33 PM11/10/14
to David da Silva Pires, gen...@soe.ucsc.edu

Dear David,

Thank you for using the UCSC Genome Browser and the new GBiB and your question about creating and loading an assembly hub.

The GBiB will allow you to skip the referenced wiki-page aimed at helping users who were using mirrors of the entire Genome Browser system to generate a new assembly, before the recent creation of the GBiB. You do not have to do any kind of MySQL operations when using assembly hubs. To create an assembly hub on the browser, and the GBiB, you do not have to go further than the following steps:http://genomewiki.ucsc.edu/index.php/Assembly_Hubs The large beneficial difference is that rather than having to have the hub.txt, genomes.txt, trackDb.txt and related assembly hub files hosted on a publicly accessible server, the files can be locally hosted when using GBiB.

To begin with, be sure you have your GBiB up and running following the steps here: http://genome.ucsc.edu/goldenPath/help/gbib.html

Below I will suggest copying a small working assembly hub to your local disk and then loading it on your GBiB. After that you can copy it or use it to follow the architecture to resolve loading your new assembly hub. Thank you again for taking the time to work with these new tools!

Given your GBiB is working, you should be able to load any public accessible hub, including this crocodile assembly hub:

http://127.0.0.1:1234/cgi-bin/hgHubConnect?hgHub_do_firstDb=on&hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hubUrl=http://hgwdev.cse.ucsc.edu/~jcarmstr/crocBrowserRC2/hub.txt

Now that you have checked that your GBiB is working, on your computer find a place you do not mind copying over the below files, about 33M total. In that directory you created, run the following wget command to recursively grab the directory structure and files needed:

wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

Alternatively, if you do not have wget installed, you can curl these files individually (skipping the optional .html files if desired). Perform the curl -O option in the location you wish to copy the files:

curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt

If you use curl, be sure to recreate the structure with matching araTha1 and araTha1/bbi directories. You could also recreate the files with a text editor if you don't have curl, double check you have them all by looking here:http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/

Once you have copied this assembly hub, follow the GBiB instructions on sharing the folder where it is located. First power off the machine, then select the Settings option for the machine in VirtualBox, and click the "Shared Folders" tab and the plus folder icon, as described here with pictures:http://genome.ucsc.edu/goldenPath/help/gbib#YourTracks

Then you can restart your GBiB and navigate to the folders location to see your shared files, http://127.0.0.1:1234/folders/, and then look for where you copied the hub.txt. In my example, this assembly hub.txt is in my shared Google Drive folder:

http://127.0.0.1:1234/folders/sf_Google_Drive/trackHubAssembly/hubExamples/hubAssembly/plantAraTha1/hub.txt

You can then paste your http://127.0.0.1:1234/folders/sf.../hub.txt address on the hgHubConnect page, http://127.0.0.1:1234/cgi-bin/hgHubConnect, under "My Hubs" and click Connect. Click "submit" and you will be browsing this assembly hub example.

Now that you have this assembly hub locally working, you can edit your own assembly hub to match up the parameters that are needed. This plant assembly hub is a much smaller slice of a bigger one you can investigate to see the extended powers of assembly hubs: http://genome-test.cse.ucsc.edu/~hiram/hubs/Plants/

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Brian Lee

unread,
Nov 10, 2014, 5:51:45 PM11/10/14
to David da Silva Pires, gen...@soe.ucsc.edu
Dear David,

Thank you for your message about loading an assembly hub in GBiB and the browser.

The MySQL commands aren't necessary to load your hub in the browser or GBiB when using assembly hubs.  I apologize if my last email may have been more lengthy then is needed.  From the description of your first email, it sounds as though you have done all the hard work already and only have to share your local files with your GBiB. 

Follow the GBiB instructions on sharing the folder where your hub.txt is located. First power off the machine, then select the Settings option for the machine in VirtualBox, and click the "Shared Folders" tab and the plus folder icon, as described here with pictures: http://genome.ucsc.edu/goldenPath/help/gbib#YourTracks

Then you can restart your GBiB and navigate to the folders location to see your shared files, http://127.0.0.1:1234/folders/, and then look for your hub.txt. It might be in a folder like http://127.0.0.1:1234/folders/sf_hub/hub.txt based on sharing the hub folder in your first email:

hubs/
└── hub.txt

Paste that URL into the My Hubs tab on the Hubs page, http://127.0.0.1:1234/cgi-bin/hgHubConnect

If that fails to work, you can follow the steps in the previous email about using wget to copy an example assembly hub to your local computer.  I've tested using that assembly hub on my local GBiB and it should provide a positive test case before trying to debug your assembly hub.

Thank you again for trying out the GBiB and using the assembly hub features. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group




--


Reply all
Reply to author
Forward
0 new messages