Using "test" bacterial hubs to share data in collaborations?

6 views
Skip to first unread message

Hendrickson, Curtis (Campus)

unread,
Sep 9, 2016, 3:00:03 PM9/9/16
to Heater, Blair Delane, Hiram Clawson, gen...@soe.ucsc.edu, msp...@soe.ucsc.edu

Hiram

 

Thanks again for the help.

 

I had a bigger picture question about the status of your project to provide genomes for all the NCBI Genome Projects, and how to use them.

 

In some other work, I’m interested in using some of your bacterial track hubs

Tb M37 http://genome-test.cse.ucsc.edu/gbdb/hubs/refseq/bacteria/18/GCF_000667805.1_Myco_tube_H37Rv_V2/

S. Aureus RF12 http://genome-test.cse.ucsc.edu/gbdb/hubs/refseq/bacteria/29/GCF_000013425.1_ASM1342v1/

To share sequencing data with collaborators who are sequencing these organisms – so I would create a track-only track hub that provided additional tracks for these genomes…..

 

Is it “safe” to do this yet?

What is the best way to have more naïve users connect to these hubs?

·         Send them http://genome-test.cse.ucsc.edu/gbdb/hubs/refseq/bacteria/29/genomes.ncbi.txt ?

·         is there any way to codify that dependency into my hub, so that your hub would be “auto-loaded” when mine gets loaded?

Is the recommended approach to download your hubs onto our servers? (which would allow us to support Blat…)

Is it equally safe to use the “ncbi” version (seq_name=NC_007795.1) or do I need to use the “ucsc” versions (seq_name=NC_007795v1), and re-map my data to that seq_name?

 

Regards,

Curtis

 

 

 

From: Heater, Blair Delane [mailto:bhe...@uab.edu]
Sent: Friday, September 09, 2016 12:51 PM
To: Hiram Clawson
Cc: gen...@soe.ucsc.edu; msp...@soe.ucsc.edu; Hendrickson, Curtis (Campus)
Subject: RE: How to Allow for Search of Gene and ID in TrackHub on UCSC Genome Browser

 

Dear Hiram,

 

Thank you for all the details and references. I will continue to look into the ‘bigGenePred’ data type and your indexing methods for streamlined conversions for viewing within the UCSC Genome Browser.  Based on your example and other documentation, I indexed the gene name in bed12 format when converting from bed to bigBed, supporting search by the gene names for each strain on our track hub in the genome browser. I appreciate your time and help.

 

Thank you,

Blair Heater

 

From: Hiram Clawson
Sent: Friday, September 2, 2016 3:16 PM
To: Heater, Blair Delane
Cc: gen...@soe.ucsc.edu; msp...@soe.ucsc.edu; Hendrickson, Curtis (Campus)
Subject: Re: How to Allow for Search of Gene and ID in TrackHub on UCSC Genome Browser

 

 

Good Afternoon Blair:

This track in the assembly hub is actually a 'bigGenePred' data type, it is not a bed file.
There is an index in the bigGenePred file on the 'name' column, and there is an additional
index of alias names for the genes in the GCF_000845245.1_ViralProj14559.ncbiGene.ix
and GCF_000845245.1_ViralProj14559.ncbiGene.ixx files.  Note the trackDb entry that specifies
all of this:

track ncbiGene
longLabel ncbiGene - gene predictions delivered with assembly from NCBI
shortLabel ncbiGene
priority 12
visibility pack
color 0,80,150
altColor 150,80,0
colorByStrand 0,80,150 150,80,0
bigDataUrl bbi/GCF_000845245.1_ViralProj14559.ncbiGene.ncbi.bb
type bigGenePred
html GCF_000845245.1_ViralProj14559.ncbiGene
searchIndex name
searchTrix GCF_000845245.1_ViralProj14559.ncbiGene.ix
url http://www.ncbi.nlm.nih.gov/nuccore/$$
urlLabel NCBI Nucleotide database
group genes

You can see all these files in the assembly hub directory:
   http://genome-test.cse.ucsc.edu/gbdb/hubs/refseq/viral/02/GCF_000845245.1_ViralProj14559/

The processing script that converted the NCBI GFF3 file into the genePred can
be found in:
 
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/utils/automation/genbank/ncbiGene.sh

Using the script gpToIx.pl to extract gene name aliases from the extra columns
in the genePred file:
 
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/hg/utils/automation/genbank/gpToIx.pl

--Hiram

On 9/2/16 12:50 PM, Heater, Blair Delane wrote:
> Dear Hiram,
>
> I am the student intern in Biomedical Informatics at the Center for Clinical and Translational Science at UAB that worked with Curtis Hendrickson creating track hubs for new genomes on the UCSC Genome Browser. I have been absent over the summer but have returned to work this week.
>
> Curtis noticed that any search on gene name (i.e. UL46) or ID (i.e. YP_081523.1) would result in error on our track hub. In an effort to determine how your track hub supported search on both gene name and ID, I converted your bigbed file, GCF_000845245.1_ViralProj14559.ncbiGene.ncbi.bb<http://genome-preview.cse.ucsc.edu/gbdb/hubs/refseq/viral/02/GCF_000845245.1_ViralProj14559/bbi/GCF_000845245.1_ViralProj14559.ncbiGene.ncbi.bb>, to bed format (the attached file) and am puzzled since it appears to have 18 columns. First I thought it was bed detail format, but that doesn’t make sense, given the description of bed detail format on the UCSC genome format documentation.
>
> 1.       Do the final and additional columns of the bed file (that correspond to "Transcript type" string geneName; "Primary identifier for gene" string geneName2; "Alternative/human readable gene name" string geneType; "Gene type" from the bigbed file)  facilitate the search for ID or gene name?
>
> 2.       How did you get this working when the format appears to contradict the UCSC genome bed or bed detail format documentation?
> I appreciate your time and help.
>
> Thank you,
> Blair Heater
>

Cath Tyner

unread,
Sep 14, 2016, 12:45:33 PM9/14/16
to Hendrickson, Curtis (Campus), Heater, Blair Delane, gen...@soe.ucsc.edu
Hello Curtis,

Thank you again for using the UCSC Genome Browser and for making these inquiries. One of our engineers has shared the information below:

Regarding the bacterial track hubs, None of these track hubs are official nor necessarily permanent. If you would like to use these data, you should download the directories of interest and host the assembly hub from your own web resources. Here, you could add blat servers locally.

Our engineering team is looking into developing scripts that would allow external sites such as this build new assembly hubs from NCBI data.

Regarding others using these data: we don't have the function yet to allow track hubs to display on assembly hubs. Such work is in progress.

The 'NCBI' names, while convenient and standard, appear to work in most cases, but there are bits of the genome browser code that is confused by the dots '.' in the names. This is an outstanding problem we are considering how to manage. We do not have a solution to this yet. The work-around is to replace '.' with 'v', but then the standard name is therefore lost for external annotators.

Please respond to this list if you have further questions!

Thank you again for your inquiry and for using the UCSC Genome Browser. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscribe@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​

Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages