UCSC trackhub track search working

158 views
Skip to first unread message

Vaneet Lotay

unread,
Jan 29, 2016, 5:34:28 PM1/29/16
to gen...@soe.ucsc.edu

Hello,

 

I have been trying for a while to setup track search on our trackhub for the latest Xenopus Genomes of Tropicalis 9.0 (xt9.0) and Laevis 9.1 (xl9.1).  I initially used the searchIndex method by creating the bigBed files with the extra parameters “-type=bed12 -extraIndex=name”.  For some reason the search worked only for Trop 9 (xt9.0) but not for Laevis 9.1 (xl9.1) and I don’t know why.  Assuming I perhaps forgot to enter the extra parameter for XL9.1, I reran the ‘bedToBigBed’ utility to make sure that extraIndex was created each time but no luck.  In XT9.0 the search only worked if the exact gene symbol was typed in not part of it, but I assume with searchTrix it can pick up the location by a smaller prefix.

 

Anyways for Laevis 9.1 it never seemed to work, occasionally I would get this error when trying a search term:

 

  • no track for table "hub_47641_XL_9.1_Gene_Model_top" found via a findSpec

Then I assumed it was because perhaps I had multiple tracks in XL9.1 (both BAM and bigBed tracks) as well as repeating the XL9.1 genome model track twice (top and bottom for visual purposes) whereas in XT9.0 there is only genome model track.  Could the track search process get confused when there’s multiple tracks or the same bigBed track duplicated?  I assume not as the searchIndex line is only added to the stanzas belonging to the genome model bigBed tracks.  Anyways, I tried with removing all but the one genome model bigBed track stanza so there’s no confusion and still the search didn’t work.

 

I then tried the searchTrix method and using the ixIxx utilitiy, added index files for the genome model track.  I added the searchTrix line to my genome model BED track stanza, but still it was not working properly.  Perhaps I didn’t design my free text file properly, I simply used this simple tab-delimited format since I didn’t need any more text to link to the IDs:

 

gene_symbol1  gene_symbol1

gene_symbol2  gene_symbol2

….

 

I ran the ixIxx utility with a prefixSize of 3, assuming that this parameter would enable genes to be found even if the first 3 characters of the gene symbol were typed in as a search term.  Still this didn’t work even if the whole gene symbol was typed.  Not sure what else to try.

 

My track hub URL is here:

 

ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt 

 

If you’d like to take a look at the configuration files in the browser, please look here:

 

ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/

 

 

Thanks,

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

Brian Lee

unread,
Feb 2, 2016, 3:16:40 PM2/2/16
to Vaneet Lotay, gen...@soe.ucsc.edu

Dear Vaneet,

Thank you for using the UCSC Genome Browser Assembly Hubs feature and your question about enabling track search to work.

There may have been a missed detail, perhaps the inclusion of the -as=fields.as option, when your building of your bigBed file. I created some searchable bigBeds based on your original information that you can try with the following link:http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_78165_xl9_1&hubUrl=http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/hub.txt

There are two bigBeds on this hub each with the 'searchIndex name' lines in trackDb.txt, where as you described, one must have the full exact name to find matches. On one you can search for "Xelaev18000001m" to see an example hit, on the other, which is slightly modified, you can search for "NEWINFO18000007m" to see a hit. There are also shared names like tm6sf2.1, which will find hits on both files.

To help expand the search hits on the first bigBed, it also includes a 'searchTrix outInfo.ix' line. This extra search index was built where the name like tm6sf2.1 was cut up into many smaller pieces starting on just three characters, for example "tm6sf2.1 tm6 tm6s tm6sf tm6sf2 tm6sf2. tm6sf2.1" you can see the input file here: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/info.txt

To create that file I used the following awk command to output the name field from the original bed, $4, and the added additional columns slowly building more characters: 
cat ORG.annot.sorted.bed | awk '{print $4, " " substr ($4, 0, 3), substr ($4, 0, 4), substr ($4, 0, 5), substr ($4, 0, 6), substr ($4, 0, 7), substr ($4, 0, 8)}' >info.txt

You can see some notes here to follow that I'll share below: http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES

1. With the original 2bit I used 'twoBitInfo' to create a chrom.sizes file needed in the bedToBigBed step:

twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes

2. I returned your bedToBigBed to an original Bed file (for later steps to build the index, and also to ensure it was sorted).

bigBedToBed XL9_1_annot_v1_8_2.bb ORG.annot.bed
sort -k1,1 -k2,2n ORG.annot.bed  > ORG.annot.sorted.bed

3. I created a new bigBed and created a bed12.as file assuming this was a bed12 type of data and used the -extraIndex=name

bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name ORG.annot.sorted.bed xl9_1.chrom.sizes searchableORG.sorted.bb

4. To used the above awk statement to create the info.txt file and then used ixIxx to generate the outInfo.ixx added in the trackDb.txt

ixIxx info.txt outInfo.ix outInfo.ixx

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply togen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


--


Vaneet Lotay

unread,
Feb 4, 2016, 2:58:21 PM2/4/16
to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

 

Thank you for all the helpful tips.  I believe I did most of the steps you laid out the first time except I didn’t create an .as file as I believed I had the standard bed12 format and also may or may not have sorted the BED file.  Also your process of creating the .ix index files seems much more extensive so would be better suited to find search hits.

 

So I tried it again now including those steps that I didn’t do the first time with the .as file and the index files and it still didn’t work.  I made sure the BED was sorted, I even replicated what you did and extracted the BED from the bigBed and then regenerated the bigBed as for some reason the order of scaffolds is different that way but still not working.  Since you’re able to create these files and run them on another track hub, is there a corrupt file in my trackhub xl9_1 folder that’s stopping it from working?  Does it matter that I have BAM tracks included as well?

 

Here’s the error I get when I attempt a search:

 

http://genome.ucsc.edu/cgi-bin/hgTracks?hgtgroup_other_close=0&hgsid=472684639_AKhRHFGE6CrAdHcF7XI4dBq5h6lG&position=pax6&hgt.positionInput=pax6&hgt.jump=go&db=hub_47641_xl9_1&c=chr1L&l=74759589&r=74773706&pix=1900&dinkL=2.0&dinkR=2.0

 

Warning/Error(s):

  • no track for table "hub_47641_XL_9.1_Gene_Model_top" found via a findSpec

 

 

Do you know what could be causing this?

 

Would appreciate any help….thanks!

 

Vaneet

Brian Lee

unread,
Feb 4, 2016, 4:43:28 PM2/4/16
to Vaneet Lotay, gen...@soe.ucsc.edu
Dear Vaneet,

Thank you for using the UCSC Genome Browser and your follow-up about employing search in your hub.

You have helped discover an issue that was enhanced for hub tracks that may need some adjustment when hub tracks are being employed with the search function, but there is a quick fix available. Change any periods in your track names to underbars: XL_9.1_Gene_Model_top ---to---> XL_9_1_Gene_Model_top

Ideally track names in hubs should not have periods in them, and our documentation shares they "must begin with a letter and contain only the following chars: [a-zA-Z0-9_]." Part of the reason for these restrictions is that SQL doesn't allow tables with periods in them, because that is how one designates databases with text like "dbName.tableName", and previously hubs would not work with periods in the track names. However, we have found there is a lot of instinct for users to use periods in names, and have adapted the code to turn the periods into underbars. It looks like perhaps some further enhancements might be needed to allow those exceptions to also propagate when users might be employing the search feature on hubs.

Thank you again for your inquiry and using the UCSC Genome Browser and assembly hubs tools! If you have any further questions, please reply toge...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


All the best,

Brian Lee
UCSC Genomics Institute


> Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply toge...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Vaneet Lotay

unread,
Feb 14, 2016, 4:52:10 PM2/14/16
to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

 

I made those changes and that error doesn’t seem to appear anymore.  However when I search for part of a gene name and the TRIX functionality brings up a page of possible hits, when I click on any of the hits it brings up this type of error:

 

  • Window out of range on Scaffold101

Does this mean the gene model in question is outside of the range of my scaffold in terms of the genome sequence?  Because I’m not sure why I’d get that error as it’s never been an issue in other genome browsers.  Would appreciate any help.

 

Thanks,

Brian Lee

unread,
Feb 17, 2016, 1:10:49 PM2/17/16
to Vaneet Lotay, gen...@soe.ucsc.edu

Dear Vaneet,

Thank you for using the UCSC Genome Browser and the search feature with your Assembly Track Hub.

I suspect there is an error in the process of building your bigBed. I tested a version of the hub I have here,http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/hub.txt, and when I load my hub and search for the gene "cox", one of the top hits is for cox8a.S on chr4S, and my link works:

hub_664_steps
cox8a.S at chr4S:6756390-6759315

However, when I load your hub, ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt, and do the same search for "cox" I get cox8a.S being located on scaffold Scaffold102 instead of chr4S :

hub_665_XL_9_1_Gene_Model_top
cox8a.S at Scaffold102:6756390-6759315

While it appears the coordinates are correct, 6756390-6759315, the chromosome name is incorrect, Scaffold102 vs chr4S, and Scaffold102 is only 335,437 bases long, explaining the error message about the location being out of range.

Perhaps you should check your chrom.sizes file, used to build the bigBed. You can obtain it from your 2bit with a utility called twoBitInfo but the results need to be sorted as well:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES

twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes

A precompiled version of twoBitInfo can be found for your system here: http://hgdownload.soe.ucsc.edu/admin/exe/

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

Vaneet Lotay

unread,
Feb 19, 2016, 2:24:36 PM2/19/16
to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

 

I notice that error and something is definitely wrong with what scaffolds are coming up on the search hits.  I see the instructions for the chrom.sizes files is sorted with this command

twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes

However, this would sort the scaffolds according to size and the BED file for my Laevis 9.1  gene model is certainly not sorted by size but by scaffold name and then start position.  Does the order of scaffolds in chrom.sizes file have to match exactly the order of scaffolds in my BED file?  If so I can resort the chrom.sizes file.

Brian Lee

unread,
Feb 23, 2016, 7:58:41 PM2/23/16
to Vaneet Lotay, gen...@soe.ucsc.edu

Hi Vaneet,

Thank you for using the UCSC Genome Browser and your question about the sorting of the chrom.sizes.

The chrom.sizes doesn't need to be sorted, it merely is sorted by convention and for convenience, but there is no downside to sorting them. Your bed file should be sorted by chrom and then chromStart, sort -k1,1 -k2,2n unsorted.bed > input.bed, before making your bigBed file

Can you please try repeating the steps I've shared earlier to see if you can create a working example:http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/NOTES

I am confident the problem is in your bigBed:ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_1/XL9_1_annot_v1_8_2.bb You can try to rebuild it with this command:

bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name ORG.annot.sorted.bed xl9_1.chrom.sizes searchableORG.sorted.bb

If it is helpful, you can use these files as inputs, derived from your original files:
http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/ORG.annot.sorted.bed

http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/xl9_1.chrom.sizes

http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/bed12.as

And then replace XL9_1_annot_v1_8_2.bb in your trackDb.txt with the new file searchableORG.sorted.bb. I believe when you built the original bigBed to be indexed on name, the indexes were constructed incorrectly for XL9_1_annot_v1_8_2.bb.

All messages sent to gen...@soe.ucsc.edu are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

Vaneet Lotay

unread,
Mar 10, 2016, 4:44:52 PM3/10/16
to Brian Lee, gen...@soe.ucsc.edu

That finally worked, thanks Brian.  I suppose something consistent must’ve been going wrong every time I built my bigBed.

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

Vaneet Lotay

unread,
May 11, 2016, 3:56:32 PM5/11/16
to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

 

I noticed the UCSC genome browser website and interface has been revamped and updated, looks good!   I had built a trackhub for Xenopus genomes that could be loaded on the UCSC genome browser and last I checked it on the old interface, there were no obvious errors and every track loaded fine.  Now when I try to load the trackhub I run into an error.

 

Not sure if I’m going to the right link, but I hover over MyData and then click on Track Hubs.  Then I would paste the URL to our trackhub hub.txt file, the initial status page with the trackhub name and genomes looks fine but then when it’s transitioning to the genome selector page I get a pop up box with this:

 

genome.ucsc.edu says: handleRefreshState: empty response from server

 

If I click on Genome Browser all the tracks appear and load correctly, just not sure why I get the error box when trying to visit the Genomes page to select which genome to display from my trackhub.

 

Is there something that’s changed in the new interface that I need to change in my trackhub configuration?

 

Thanks,

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

403-220-6652

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

From: Vaneet Lotay
Sent: Thursday, March 10, 2016 2:42 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>
Cc: gen...@soe.ucsc.edu
Subject: RE: [genome] UCSC trackhub track search working

 

That finally worked, thanks Brian.  I suppose something consistent must’ve been going wrong every time I built my bigBed.

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

403-220-6652

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

Matthew Speir

unread,
May 17, 2016, 12:40:07 PM5/17/16
to Vaneet Lotay, Brian Lee, gen...@soe.ucsc.edu
Hi Vaneet,

Thank you for this bug report.

We have fixed this issue on our test server, http://genome-test.soe.ucsc.edu, and you should be able to load your hub there without any issues. If you would like to continue using our public site, http://genome.ucsc.edu, you can edit your genomes.txt file to add the setting "htmlPath someFile" to the stanza for each genome. The file "someFile" can be an empty file.

The fix for this issue should make it to our public site with our next software release on May 31.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Vaneet Lotay

unread,
Jul 26, 2016, 7:01:55 PM7/26/16
to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

 

I was replacing the bigBed file with a new version of the Xenopus Laevis 9.1 genome model, and I’ve encountered the same errors again with gene search that you helped me with in the previous iteration of this:

 

Warning/Error(s):

  • Window out of range on Scaffold100

 OK 

 

My UCSC trackhub is hosted at this address here:

 

ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt

 

You can even see the clear difference where I’ve replaced the XL9.1 bigbed track on the ‘top’ and I still have the old sorted bigBed track on the ‘bottom’ that you gave me.  If you click on the search results from the ‘bottom’ track no errors are shown but from the ‘top’ track it displays the error shown above with the wrong Scaffold usually.

 

I still don’t know why this error happens, you believed there was something wrong with how I build my bigBed which is probably right, but we could never pinpoint what exactly.  To save a lot of emails this time, can you simply do what you did last time and take my current bigBed file (XL9_1v3_2_primary.bb), go back to the BED source file and build it again into a successful working .bb file for me to use?  I would really appreciate it.

 

Thanks,

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

Vaneet Lotay

unread,
Aug 4, 2016, 2:03:24 PM8/4/16
to Brian Lee, gen...@soe.ucsc.edu

Can you help me out with this Brian?  Or anyone else that can, that would be great, thanks!

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

403-220-6652

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

From: Vaneet Lotay

Sent: Tuesday, July 26, 2016 5:02 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>

Cc: 'gen...@soe.ucsc.edu' <gen...@soe.ucsc.edu>
Subject: RE: [genome] UCSC trackhub track search working

 

Hey Brian,

 

I was replacing the bigBed file with a new version of the Xenopus Laevis 9.1 genome model, and I’ve encountered the same errors again with gene search that you helped me with in the previous iteration of this:

 

Warning/Error(s):

  • Window out of range on Scaffold100

 OK 

 

My UCSC trackhub is hosted at this address here:

 

ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt

 

You can even see the clear difference where I’ve replaced the XL9.1 bigbed track on the ‘top’ and I still have the old sorted bigBed track on the ‘bottom’ that you gave me.  If you click on the search results from the ‘bottom’ track no errors are shown but from the ‘top’ track it displays the error shown above with the wrong Scaffold usually.

 

I still don’t know why this error happens, you believed there was something wrong with how I build my bigBed which is probably right, but we could never pinpoint what exactly.  To save a lot of emails this time, can you simply do what you did last time and take my current bigBed file (XL9_1v3_2_primary.bb), go back to the BED source file and build it again into a successful working .bb file for me to use?  I would really appreciate it.

 

Thanks,

 

Vaneet

 

Vaneet Lotay

Xenbase Bioinformatician

724 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

 

Brian Lee

unread,
Aug 5, 2016, 5:37:49 PM8/5/16
to Vaneet Lotay, gen...@soe.ucsc.edu
Dear Vaneet,

Thank you for using the UCSC Genome Browser and your question about your searchable bigBed.

I did some testing and it does indeed look to be due to how you have built your bigBed file. I could not find your chrom.sizes file, but I think it may be the source of your problems. To explain the error you are seeing if in your hub you search a gene like "cox18" you get a hit on "cox18.L at Scaffold100:74759593-74773706" however that scaffold in your hub has the size Scaffold100:1-313,023 so that clicking the generated link is in a region "out of range on Scaffold100". I suspect if you were to look in your chrom.sizes file and at the line for Scaffold100, rather than Scaffold100 313023 perhaps you have Scaffold100 219879705.

Try doing the following steps.

1.Aquire these files:
wget http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/xl9_1.chrom.sizes

wget http://hgwdev.cse.ucsc.edu/~brianlee/hubTesting/testMLQ16750searchAssembly/xl9_1/bed12.as

2. Regenerate your bed file from your new gene track:

bigBedToBed XL9_1v3_2_primary.bb original.bed

3. Then use the above bed12.as and xl9_1.chrom.sizes files to regenerate your .bb

bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name original.bed xl9_1.chrom.sizes newXL9_1v3_2_primary.bb

4. Replace XL9_1v3_2_primary.bb with newXL9_1v3_2_primary.bb in your hub and I would expect it should work. I would remove your older chrom.sizes too and use xl9_1.chrom.sizes, which was generated from your 2bit long ago (twoBitInfo XL_9_1_scaffolds.2bit stdout | sort -k2rn > xl9_1.chrom.sizes).


Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


> All messages sent to gen...@soe.ucsc.edu are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togeno...@soe.ucsc.edu.
Reply all
Reply to author
Forward
0 new messages