Hello,
I have been trying for a while to setup track search on our trackhub for the latest Xenopus Genomes of Tropicalis 9.0 (xt9.0) and Laevis 9.1 (xl9.1). I initially used the searchIndex method by creating the bigBed files with the extra parameters “-type=bed12 -extraIndex=name”. For some reason the search worked only for Trop 9 (xt9.0) but not for Laevis 9.1 (xl9.1) and I don’t know why. Assuming I perhaps forgot to enter the extra parameter for XL9.1, I reran the ‘bedToBigBed’ utility to make sure that extraIndex was created each time but no luck. In XT9.0 the search only worked if the exact gene symbol was typed in not part of it, but I assume with searchTrix it can pick up the location by a smaller prefix.
Anyways for Laevis 9.1 it never seemed to work, occasionally I would get this error when trying a search term:
Then I assumed it was because perhaps I had multiple tracks in XL9.1 (both BAM and bigBed tracks) as well as repeating the XL9.1 genome model track twice (top and bottom for visual purposes) whereas in XT9.0 there is only genome model track. Could the track search process get confused when there’s multiple tracks or the same bigBed track duplicated? I assume not as the searchIndex line is only added to the stanzas belonging to the genome model bigBed tracks. Anyways, I tried with removing all but the one genome model bigBed track stanza so there’s no confusion and still the search didn’t work.
I then tried the searchTrix method and using the ixIxx utilitiy, added index files for the genome model track. I added the searchTrix line to my genome model BED track stanza, but still it was not working properly. Perhaps I didn’t design my free text file properly, I simply used this simple tab-delimited format since I didn’t need any more text to link to the IDs:
gene_symbol1 gene_symbol1
gene_symbol2 gene_symbol2
….
I ran the ixIxx utility with a prefixSize of 3, assuming that this parameter would enable genes to be found even if the first 3 characters of the gene symbol were typed in as a search term. Still this didn’t work even if the whole gene symbol was typed. Not sure what else to try.
My track hub URL is here:
ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt
If you’d like to take a look at the configuration files in the browser, please look here:
ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
Dear Vaneet,
Thank you for using the UCSC Genome Browser Assembly Hubs feature and your question about enabling track search to work.
There may have been a missed detail, perhaps the inclusion of the -as=fields.as option, when your building of your bigBed file. I created some searchable bigBeds based on your original information that you can try with the following link:http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_78165_xl9_1&hubUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/hub.txt
There are two bigBeds on this hub each with the 'searchIndex name' lines in trackDb.txt, where as you described, one must have the full exact name to find matches. On one you can search for "Xelaev18000001m" to see an example hit, on the other, which is slightly modified, you can search for "NEWINFO18000007m" to see a hit. There are also shared names like tm6sf2.1, which will find hits on both files.
To help expand the search hits on the first bigBed, it also includes a 'searchTrix outInfo.ix' line. This extra search index was built where the name like tm6sf2.1 was cut up into many smaller pieces starting on just three characters, for example "tm6sf2.1 tm6 tm6s tm6sf tm6sf2 tm6sf2. tm6sf2.1" you can see the input file here: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/info.txt
To create that file I used the following awk command to output the name field from the original bed, $4, and the added additional columns slowly building more characters: cat ORG.annot.sorted.bed | awk '{print $4, " " substr ($4, 0, 3), substr ($4, 0, 4), substr ($4, 0, 5), substr ($4, 0, 6), substr ($4, 0, 7), substr ($4, 0, 8)}' >info.txt
You can see some notes here to follow that I'll share below: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES
1. With the original 2bit I used 'twoBitInfo' to create a chrom.sizes file needed in the bedToBigBed step:
twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes
2. I returned your bedToBigBed to an original Bed file (for later steps to build the index, and also to ensure it was sorted).
bigBedToBed XL9_1_annot_v1_8_2.bb ORG.annot.bed
sort -k1,1 -k2,2n ORG.annot.bed > ORG.annot.sorted.bed
3. I created a new bigBed and created a bed12.as file assuming this was a bed12 type of data and used the -extraIndex=name
bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name ORG.annot.sorted.bed xl9_1.chrom.sizes searchableORG.sorted.bb
4. To used the above awk statement to create the info.txt file and then used ixIxx to generate the outInfo.ixx added in the trackDb.txt
ixIxx info.txt outInfo.ix outInfo.ixx
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply togen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
--
Hey Brian,
Thank you for all the helpful tips. I believe I did most of the steps you laid out the first time except I didn’t create an .as file as I believed I had the standard bed12 format and also may or may not have sorted the BED file. Also your process of creating the .ix index files seems much more extensive so would be better suited to find search hits.
So I tried it again now including those steps that I didn’t do the first time with the .as file and the index files and it still didn’t work. I made sure the BED was sorted, I even replicated what you did and extracted the BED from the bigBed and then regenerated the bigBed as for some reason the order of scaffolds is different that way but still not working. Since you’re able to create these files and run them on another track hub, is there a corrupt file in my trackhub xl9_1 folder that’s stopping it from working? Does it matter that I have BAM tracks included as well?
Here’s the error I get when I attempt a search:
Warning/Error(s):
Do you know what could be causing this?
Would appreciate any help….thanks!
Vaneet
Hey Brian,
I made those changes and that error doesn’t seem to appear anymore. However when I search for part of a gene name and the TRIX functionality brings up a page of possible hits, when I click on any of the hits it brings up this type of error:
Does this mean the gene model in question is outside of the range of my scaffold in terms of the genome sequence? Because I’m not sure why I’d get that error as it’s never been an issue in other genome browsers. Would appreciate any help.
Thanks,
Dear Vaneet,
Thank you for using the UCSC Genome Browser and the search feature with your Assembly Track Hub.
I suspect there is an error in the process of building your bigBed. I tested a version of the hub I have here,http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/hub.txt, and when I load my hub and search for the gene "cox", one of the top hits is for cox8a.S on chr4S, and my link works:
hub_664_steps
cox8a.S at chr4S:6756390-6759315
However, when I load your hub, ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt, and do the same search for "cox" I get cox8a.S being located on scaffold Scaffold102 instead of chr4S :
hub_665_XL_9_1_Gene_Model_top
cox8a.S at Scaffold102:6756390-6759315
While it appears the coordinates are correct, 6756390-6759315, the chromosome name is incorrect, Scaffold102 vs chr4S, and Scaffold102 is only 335,437 bases long, explaining the error message about the location being out of range.
Perhaps you should check your chrom.sizes file, used to build the bigBed. You can obtain it from your 2bit with a utility called twoBitInfo but the results need to be sorted as well:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES
twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes
A precompiled version of twoBitInfo can be found for your system here: http://hgdownload.soe.ucsc.edu/admin/exe/
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
Hey Brian,
I notice that error and something is definitely wrong with what scaffolds are coming up on the search hits. I see the instructions for the chrom.sizes files is sorted with this command
twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes
However, this would sort the scaffolds according to size and the BED file for my Laevis 9.1 gene model is certainly not sorted by size but by scaffold name and then start position. Does the order of scaffolds in chrom.sizes file have to match exactly the order of scaffolds in my BED file? If so I can resort the chrom.sizes file.
Hi Vaneet,
Thank you for using the UCSC Genome Browser and your question about the sorting of the chrom.sizes.
The chrom.sizes doesn't need to be sorted, it merely is sorted by convention and for convenience, but there is no downside to sorting them. Your bed file should be sorted by chrom and then chromStart, sort -k1,1 -k2,2n unsorted.bed > input.bed, before making your bigBed file
Can you please try repeating the steps I've shared earlier to see if you can create a working example:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES
I am confident the problem is in your bigBed:ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_1/XL9_1_annot_v1_8_2.bb You can try to rebuild it with this command:
bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name ORG.annot.sorted.bed xl9_1.chrom.sizes searchableORG.sorted.bb
If it is helpful, you can use these files as inputs, derived from your original files:
http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/ORG.annot.sorted.bed
http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/xl9_1.chrom.sizes
http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/bed12.as
And then replace XL9_1_annot_v1_8_2.bb in your trackDb.txt with the new file searchableORG.sorted.bb. I believe when you built the original bigBed to be indexed on name, the indexes were constructed incorrectly for XL9_1_annot_v1_8_2.bb.
All messages sent to gen...@soe.ucsc.edu are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
That finally worked, thanks Brian. I suppose something consistent must’ve been going wrong every time I built my bigBed.
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
Hey Brian,
I noticed the UCSC genome browser website and interface has been revamped and updated, looks good! I had built a trackhub for Xenopus genomes that could be loaded on the UCSC genome browser and last I checked it on the old interface, there were no obvious errors and every track loaded fine. Now when I try to load the trackhub I run into an error.
Not sure if I’m going to the right link, but I hover over MyData and then click on Track Hubs. Then I would paste the URL to our trackhub hub.txt file, the initial status page with the trackhub name and genomes looks fine but then when it’s transitioning to the genome selector page I get a pop up box with this:
genome.ucsc.edu says: handleRefreshState: empty response from server
If I click on Genome Browser all the tracks appear and load correctly, just not sure why I get the error box when trying to visit the Genomes page to select which genome to display from my trackhub.
Is there something that’s changed in the new interface that I need to change in my trackhub configuration?
Thanks,
Vaneet
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
From: Vaneet Lotay
Sent: Thursday, March 10, 2016 2:42 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>
Cc: gen...@soe.ucsc.edu
Subject: RE: [genome] UCSC trackhub track search working
That finally worked, thanks Brian. I suppose something consistent must’ve been going wrong every time I built my bigBed.
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
--
Hey Brian,
I was replacing the bigBed file with a new version of the Xenopus Laevis 9.1 genome model, and I’ve encountered the same errors again with gene search that you helped me with in the previous iteration of this:
Warning/Error(s):
OK
My UCSC trackhub is hosted at this address here:
ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt
You can even see the clear difference where I’ve replaced the XL9.1 bigbed track on the ‘top’ and I still have the old sorted bigBed track on the ‘bottom’ that you gave me. If you click on the search results from the ‘bottom’ track no errors are shown but from the ‘top’ track it displays the error shown above with the wrong Scaffold usually.
I still don’t know why this error happens, you believed there was something wrong with how I build my bigBed which is probably right, but we could never pinpoint what exactly. To save a lot of emails this time, can you simply do what you did last time and take my current bigBed file (XL9_1v3_2_primary.bb), go back to the BED source file and build it again into a successful working .bb file for me to use? I would really appreciate it.
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
Can you help me out with this Brian? Or anyone else that can, that would be great, thanks!
Vaneet
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
From: Vaneet Lotay
Sent: Tuesday, July 26, 2016 5:02 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>
Cc: 'gen...@soe.ucsc.edu' <gen...@soe.ucsc.edu>
Subject: RE: [genome] UCSC trackhub track search working
Hey Brian,
I was replacing the bigBed file with a new version of the Xenopus Laevis 9.1 genome model, and I’ve encountered the same errors again with gene search that you helped me with in the previous iteration of this:
Warning/Error(s):
OK
My UCSC trackhub is hosted at this address here:
ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt
You can even see the clear difference where I’ve replaced the XL9.1 bigbed track on the ‘top’ and I still have the old sorted bigBed track on the ‘bottom’ that you gave me. If you click on the search results from the ‘bottom’ track no errors are shown but from the ‘top’ track it displays the error shown above with the wrong Scaffold usually.
I still don’t know why this error happens, you believed there was something wrong with how I build my bigBed which is probably right, but we could never pinpoint what exactly. To save a lot of emails this time, can you simply do what you did last time and take my current bigBed file (XL9_1v3_2_primary.bb), go back to the BED source file and build it again into a successful working .bb file for me to use? I would really appreciate it.
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
724 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
wget http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/xl9_1.chrom.sizes
wget http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/bed12.as
bigBedToBed XL9_1v3_2_primary.bb original.bed
bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name original.bed xl9_1.chrom.sizes newXL9_1v3_2_primary.bb