Dear Hiram,
I am the student intern in Biomedical Informatics at the Center for Clinical and Translational Science at UAB that worked with Curtis Hendrickson creating track hubs for new genomes on the UCSC Genome Browser. I have been absent over the summer but have returned to work this week.
Curtis noticed that any search on gene name (i.e. UL46) or ID (i.e. YP_081523.1) would result in error on our track hub. In an effort to determine how your track hub supported search on both gene name and ID, I converted your bigbed file, GCF_000845245.1_ViralProj14559.ncbiGene.ncbi.bb, to bed format (the attached file) and am puzzled since it appears to have 18 columns. First I thought it was bed detail format, but that doesn’t make sense, given the description of bed detail format on the UCSC genome format documentation.
1. Do the final and additional columns of the bed file (that correspond to "Transcript type" string geneName; "Primary identifier for gene" string geneName2; "Alternative/human readable gene name" string geneType; "Gene type" from the bigbed file) facilitate the search for ID or gene name?
2. How did you get this working when the format appears to contradict the UCSC genome bed or bed detail format documentation?
I appreciate your time and help.
Thank you,
Blair Heater
Dear Hiram,
Thank you for all the details and references. I will continue to look into the ‘bigGenePred’ data type and your indexing methods for streamlined conversions for viewing within the UCSC Genome Browser. Based on your example and other documentation, I indexed the gene name in bed12 format when converting from bed to bigBed, supporting search by the gene names for each strain on our track hub in the genome browser. I appreciate your time and help.
Thank you,
Blair Heater