Dear UCSC,
We are working on creating a track hub for the virus Human Herpesvirus 5, based initially on the NCBI RefSeq entry NC_006273.2.
ftp://ftp.ncbi.nlm.nih.gov/genomes/Viruses/Human_herpesvirus_5_uid14559/
this organism has a single segment genome (1 chromosome).
However, the fasta file provided by NCBI RefSeq, gives a “chromosome name” of
>gi|155573622|ref|NC_006273.2| Human herpesvirus 5 strain Merlin, complete genome
For compatibility with RefSeq, we created our hub .2bit file and .bigbed files using that same “chromosome” name, so that VCF files created by users who download their reference genome from RefSeq will work.
https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_hub/hub.txt
This hub passes muster with hubCheck (see below).
However, that seems to be causing some indigestion in the UCSC Browser.
I can get the display to come up, including the transcript track we provide, but every time we try and interact, it hangs.
Also, the only way to get to a display is to go through the “(sequences)” link.
Most other pathways in end in an error window:
Warning/Error(s):
OK
And attempts to zoom, show an chromosome name that is truncated after the “gi” – ie at the first pipe (see figure below).
Questions
1. Is our guess correct that the pipes (|) in the chromosome name are at fault?
2. Do you support some sort of “aliases” file like IGV does? (We use that heavily to allow us to use “nicer” chromosome names, while still being compatible with things computed with RefSeq acquired reference fastas)
Thanks!
Regards,
Curtis
Research Associate, Informatics Unit
Center for Clinical and Translational Science
University of Alabama at Birmingham
Tel: 205.975.5240
Email: cur...@uab.edu
# hub with |’s in chromosome name passes hubCheck
[curtish@cheaha tmp]$ rm -rf hubCheck ;(module load ngs-ccts/ucsc_kent/2014-03-05 ; hubCheck -verbose=2 -udcDir=hubCheck https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt)
udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt, hubCheck)
checking http remote info on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt
bitmap file hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt/bitmap does not already exist, creating.
reading http/https/ftp data - 258 bytes at 0 - on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt
udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/genomes.txt, hubCheck)
checking http remote info on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/genomes.txt
bitmap file hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/genomes.txt/bitmap does not already exist, creating.
reading http/https/ftp data - 406 bytes at 0 - on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/genomes.txt
hub https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt
shortLabel hh5Merlin2
longLabel Track hub for HCMV Strain Merlin based on gi|155573622|ref|NC_006273.2|
genomes.txt has 1 elements
udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/trackDb.txt, hubCheck)
checking http remote info on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/trackDb.txt
bitmap file hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/trackDb.txt/bitmap does not already exist, creating.
reading http/https/ftp data - 143 bytes at 0 - on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/trackDb.txt
checking _hh5Merlin2._hh5Merlin2_refseq_mrna type bigBed 12 at https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273.2.mrna.bb
udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273.2.mrna.bb, hubCheck)
checking http remote info on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273.2.mrna.bb
bitmap file hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273.2.mrna.bb/bitmap does not already exist, creating.
reading http/https/ftp data - 8192 bytes at 0 - on https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273.2.mrna.bb
[curtish@cheaha tmp]$
View after clicking on (Sequences)
Figure: zoom in to UL73 (right click on UL73 transcript, select “zoom to…”)
Browser hangs trying to zoom….
--
Hiram
Thank you very much!
That was, indeed, our fallback plan, to use the accession only, especially with gi numbers phasing out.
However, from your URL, it looks like someone has already done a similar effort.
Can you point me to some reference for this project - looks like someone did/is doing all the RefSeq genomes in a systematic way?
http://genome-preview.cse.ucsc.edu/gbdb/hubs/refseq/viral/viral.ncbi.html
so the HH5/HCMV stub is
What is the status / roadmap for this project?
How is the Genbank -> Bed file translation done for these?
We had looked around for Genbank-> Bed file translators that would preserve CDS and exon structure, and hadn’t found anything we liked. Obviously we missed one.
Regards,
Curtis
-----Original Message-----
From: Hiram Clawson [mailto:hi...@soe.ucsc.edu]
Sent: Wednesday, April 06, 2016 4:28 PM
To: Curtis Hendrickson (Campus)
Cc: gen...@soe.ucsc.edu; Blair Delane Heater; Elliot J Lefkowitz
Subject: Re: [genome] adding a genome using a custom track hub - are pipe's allowed in chr names?
Good Afternoon Curtis:
>> xt
>>
>> bitmap file
>> hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/ge
>> nomes.txt/bitmap
>> does not already exist, creating.
>>
>> reading http/https/ftp data - 406 bytes at 0 - on
>> https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/genomes.t
>> xt
>>
>> hub
>> https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt
>>
>> shortLabel hh5Merlin2
>>
>> longLabel Track hub for HCMV Strain Merlin based on
>> gi|155573622|ref|NC_006273.2|
>>
>> genomes.txt has 1 elements
>>
>> udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_p
>> ub/hh5Merlin2/trackDb.txt,
>> hubCheck)
>>
>> checking http remote info on
>> 2/trackDb.txt
>>
>> bitmap file
>> hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh
>> 5Merlin2/trackDb.txt/bitmap
>> does not already exist, creating.
>>
>> reading http/https/ftp data - 143 bytes at 0 - on
>> https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin
>> 2/trackDb.txt
>>
>> checking _hh5Merlin2._hh5Merlin2_refseq_mrna type bigBed 12 at
>> https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin
>> 2/NC_006273.2.mrna.bb
>>
>> udcfileOpen(https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_p
>> ub/hh5Merlin2/NC_006273.2.mrna.bb,
>> hubCheck)
>>
>> checking http remote info on
>> 2/NC_006273.2.mrna.bb
>>
>> bitmap file
>> hubCheck/https/data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh
>> 5Merlin2/NC_006273.2.mrna.bb/bitmap
>> does not already exist, creating.
Dear Matthew Speir,
Thanks for your help resolving the issue of the unsupported pipes. We attempted to configure our hh5Merlin2 track hub to show up under the Viruses option in the group pull down menu on the Genome Browser Gateway. If we changed the shortLabel in hub.txt to any text other than ‘Viruses’, it worked and appeared in the drop down menu. But, when the shortLabel was set to ‘Viruses’, disconnecting and reconnecting the track hub didn’t give error but reverted to the shortLabel of previously connected hub. For the moment, we set the shortLabel to be Viruses-HH5(HCMV) as seen below to associate the track hub with Viruses, while we can’t successfully place in within that option. Is it possible to set up the track hub under the ‘Viruses’ option in the group pull down menu? If so, what steps are we missing? If not, is there any standard for group names?
Thank you,
Blair Heater
From: Matthew Speir
Sent: Wednesday, April 6, 2016 4:22 PM
To: Curtis Hendrickson (Campus);
gen...@soe.ucsc.edu
Cc: Blair Delane Heater;
Elliot J Lefkowitz
Subject: Re: [genome] adding a genome using a custom track hub - are pipe's allowed in chr names?
Matthew & Hiram,
A second question occurred to us after the problem with the pipes (|) in the chromosome names: Is there a problem with using period (.) in chromosome names that we have yet to trip over ?
In our initial testing with chrom=”NC_006273.2”, things seem to work well.
We noticed that Hiram has converted RefSeq/NCBI’s “NC_006273.2” to “NC_006273v2” in part of his prototype (though perhaps we’re getting confused about the “ncbi” and “ucsc” versions of the project and which one is the “final” one), and that causes us to worry about using “.”.
If not, I’d like to make a general plea to keep things as compatible as possible with NCBI. I’ve seen a lot of people wasting effort on other tools because of subtle, often unnecessary, naming differences, that make data produced using one version of a reference incompatible with another, essentially identical-except-for-names version of the same reference. Thus, it would nice if a VCF file computed using a reference genome from NCBI would work, w/o “chromosome name remapping” by the user, directly against UCSC.
Perhaps this issue is already addressed by the *.ncbiToUcsc.lift, *.ucsc.to.ncbi.fake.names and *ucscToNcbi.lift files as part of an alias system? Should we be including similar files in our hub/genome?
With NCBI phasing out gi numbers, but not having yet removed them from the .fa and genomes .fna files they produce, I realize that 100% compatibility can be an elusive target to hit. It is nice that the Assembly project .fna’s that Hiram is working from already have the new fasta header format!
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000845245.1_ViralProj14559
>NC_006273.2 Human herpesvirus 5 strain Merlin, complete genome
>gi|155573622|ref|NC_006273.2| Human herpesvirus 5 strain Merlin, complete genome
>gi|155573622|ref|NC_006273.2| Human herpesvirus 5 strain Merlin, complete genome
Thanks again to Hiram for sharing his work. That’s a great project that will be really useful to a lot of people!
--
Brian
Thanks for the detailed help.
I can see why things that become paths/table/column names have very restricted character sets. I’m a little surprised that chromosome name in particular becomes a path/table/column name. I would have expected it to be a data value in a column/row, and thus exempt from these restrictions, but I’ve never studies your db schema for tracks/genomes.
Perhaps Hub documentation could be augmented to warn about these restrictions and recommend a standard way of mapping NCBI fasta names, so that most hubs will operate similarly.
I see why you decided to keep the existing groups “closed”. Again, a little more in the docs indicating which field will become that user-visible group name, and noting the restriction, would be great.
The shorter refresh rate will be useful too, and thanks also for the pointers to BLAT and GBiB. We might do BLAT, but not GBiB. .
Regards,
Curtis
Matthew, Hiram,
We're trying to add a BLAT server to our viral custom track hub.
We set up the server on a publically visible VM and it seems to be running fine (see ps on VM below).
But when I try to query it, I get errors, both from the command-line and the website, but I can’t figureout how I’ve mis-configured it.
It looks like it’s slapping the URL of the seq_dir infront of the local path of the seq_dir, but I’m not sure where that’s coming from!
All help appreciated!
Regards,
Curtis
Research Associate, Informatics Unit
Center for Clinical and Translational Science
University of Alabama at Birmingham
Tel: 205.975.5240
Email: cur...@uab.edu
Regards,
Curtis
Hub: https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hub.txt
##
## UCSC blat error
##
Warning/Error(s):
##
## gfClient error
##
# telnet connection ok
curtish@genome-BMIlinux:~/vm-blat4trackhub-blat$ telnet 164.111.161.69 17777
Trying 164.111.161.69...
Connected to 164.111.161.69.
Escape character is '^]'.
^CConnection closed by foreign host.
# blat errors out
curtish@genome-BMIlinux:~/vm-blat4trackhub-blat$ ucsc_kent/2016-05-10/blat/gfClient 164.111.161.69 17777 data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2 test.fa test.psl
Expecting 6 words from server got 2
##
## gfServer running on VM, can see .2bit files
##
# process running with recommended flags
blat@blat4trackhubs6-open:~$ ps -eaf | grep gfServer
blat 23003 1 0 21:29 pts/1 00:00:00 ucsc_kent/2016-05-10/blat/gfServer start localhost 17777 -trans data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP745636v1.2bit data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273v2.2bit
blat 23004 1 0 21:29 pts/1 00:00:00 ucsc_kent/2016-05-10/blat/gfServer start localhost 17779 -stepSize=5 data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP745636v1.2bit data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273v2.2bit
blat 23827 21461 0 21:33 pts/1 00:00:00 grep --color=auto gfServer
# 2bit files fine
blat@blat4trackhubs6-open:~$ file data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP745636v1.2bit data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273v2.2bit
data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP745636v1.2bit: data
data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273v2.2bit: data
blat@blat4trackhubs6-open:~$
Hiram,
Thanks. The problem is that our hub has multiple genomes, each in it's own subdirectory.
Essentially, for each genome we have a subdirectory with the 2bit and all the track files.
https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP745636v1.2bit
https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_006273v2.2bit
I symlink'ed them all into one location on the BLAT VM, but I get an odd error:
I blat against one genome and get an error for the other
Warning/Error(s):
Couldn't open https://data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/KP745636v1.2bit
-----Original Message-----
From: Hiram Clawson [mailto:hi...@soe.ucsc.edu]
Sent: Wednesday, May 25, 2016 6:10 PM
To: Curtis Hendrickson (Campus); Blair Delane Heater; Matthew Speir; gen...@soe.ucsc.edu
Cc: Elliot J Lefkowitz
Subject: Re: [genome] adding a genome using a custom track hub - are pipe's allowed in chr names?
Good Afternoon Curtis:
> data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP7
> 45636v1.2bit
> data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5Merlin2/NC_0062
> 73v2.2bit
>
> data.genome.uab.edu/public/ucsc_track_hubs/hcmv_pub/hh5BE_7_2011v1/KP7
> 45636v1.2bit: data
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
In-Silico PCR
I did Tools>In Silico PCR, and got
"Note: In-Silico PCR is not available for hub_86647_HH5 strain Merlin NC_006273v2; defaulting to Human Dec. 2013 (GRCh38/hg38)"
Is that easy to add? What is required? It looked from hubQuickStartAssembly that setting up the gfServers would enable both blat and In-Silico PCR.
hh5Merlin2.chrom.sizes
I notice Blair put the .chrom.sizes files at the top of the hub, not in the assembly/genome specific subdirectories.
Can they be moved into the subdirectories?
Cath
Actually, no, we’ve stayed “on list”, and I haven’t heard from Hiram since I sent the inquiry you forward below.
(You had mentioned he was on vacation).
I can re-send later today….
Curtis
In-Silico PCR
I did Tools>In Silico PCR, and got
"Note: In-Silico PCR is not available for hub_86647_HH5 strain Merlin NC_006273v2; defaulting to Human Dec. 2013 (GRCh38/hg38)"
Is that easy to add? What is required? It looked from hubQuickStartAssembly that setting up the gfServers would enable both blat and In-Silico PCR.
hh5Merlin2.chrom.sizes
I notice Blair put the .chrom.sizes files at the top of the hub, not in the assembly/genome specific subdirectories.
Can they be moved into the subdirectories?