Correcting "accidental typos" in hg18 / hg19 data hubs ?

8 views
Skip to first unread message

Jelena Telenius

unread,
Aug 12, 2015, 3:21:16 PM8/12/15
to gen...@soe.ucsc.edu
To the very capable UCSC browser team,

When trying to fix typos in my short labels in hg18 / hg19 hubs in the tracks.txt file,
I have faced very curious features of the data hubs :

I have noticed "instabilities" in hg18 and hg19 data hubs, so, 
that for example 

- hubCheck -clear does not always clear the hub (complains about the port, and refuses to run clear)
- after succesfully clearing and re-loading the hub (with hubCheck), some fields (especially the shortLabel) still have the old name
- udcTimeout=1 has no effect on this
- after changing the hub genome (from hg18 to hg19), and clearing hub with hubCheck, the hub still shows the old genome
- after moving the whole hub to a new public path (and changing the data after that), loading the hub still gives the OLD data
- hub gets recognized even when it is "not there" any more - if the hub.txt is in different folder, pointing to different genome.txt and tracks.txt files - it still gives the old information.

This is not only server lag - some hubs refuse completely to change their shortLabel : longest time so far has been 2 weeks (and counting).

None of this happens in mm9 genome.

I cannot succesfully replicate any of these - it seems these are "stochastic" things, and happen "every now and then".
However - if a hub gets this way "wonky" - it seems to be, that "nothing helps" - clearing it with hubCheck does not help, changing the hub to a different location in local server does not help, neither does changing hub name etc.

I have a feeling that the hub somehow stays lingering as a "ghost" in the UCSC databases - and cannot be removed ?

These things see to happen most probably when I want to make "reasonably small changes" to the hub - if I for example want to fix a typo in a shortLabel, and want to fix it.
Is it possible, that somehow the hub is seen as "similar enough" - and the old one is associated to it, instead of the corrected one, even though hubCheck -clear was ran to it succesfully ?

Some of these things seem to "vanish" with time - but for one of our hubs we have waited for 2 weeks,
and it still loads as "same wrong hub" - under different hub names, having it different local hub folder locations, whether or not it has been cleared with hubCheck or not


I am mystified - very interesting !

Could you tell me, how to survive with this kind of behavior - is there some kind of "reset" method I haven't yet tried ?

These things I have tried (accompanying each change with hubCheck -clear and udcTimeout=1 ) :
- changing hub name
- changing hub.txt file name
- changing track name (for the track I want to change the label)
- changing folder of hub in the public server
- deleting hub.txt - letting hub "crash" to red warning, re-loading hub with restored hub.txt


All the best !

Jelena Telenius



--
Jelena Telenius, PhD
bioinformatician

Genome Biology Group
MRC Department of Molecular Haematology
Weatherall Institute Of Molecular Medicine
Oxford, OX3 9DS 
United Kingdom

jelena....@gmail.com

Jonathan Casper

unread,
Aug 19, 2015, 8:47:12 PM8/19/15
to Jelena Telenius, gen...@soe.ucsc.edu

Hello Jelena,

Thank you for your question about issues with your track hubs. That certainly does sound puzzling!

The hubCheck utility does not interact with UCSC's cache of your data files (hubCheck makes its own local cache on your computer), so running hubCheck -clear will not cause the Genome Browser to immediately reload your hub files. The -clear option has actually been removed completely from the most recent version of hubCheck - it might be worth updating and seeing if that resolves any of your issues. All the same, 2 weeks is excessive.

We cache the hub.txt and related hub files on our servers, and refresh them if they're visited after 5 minutes since our last update. There is no "similar enough" - we just download the new file completely and replace our old copy. If you make a change to your hub, you can force our servers to reload your hub sooner by adding the URL parameter &udcTimeout=N, where N is the number of seconds to cache the hub files for before reloading. For example, you could load your hub onto the UCSC Genome Browser, then click in the location bar for hgTracks (e.g., http://genome.ucsc.edu/cgi-bin/hgTracks?http://genome.ucsc.edu/cgi-bin/hgTracks?db=danRer7&position=chr13%3A25004825-25082273&hgsid=441095055_Mr9cJhXXvWZd78i6fAOeHA8quJXH), paste "&udcTimeout=5" on the end of it, and then press enter to load that URL. The Genome Browser should now reload your hub files after 5 seconds, and will show a message at the top of every page as follows:

The Genome Browser cart currently includes the "udcTimeout" string. While this is useful for debugging hubs, it may negatively impact performance. To clear this variable, click here.

As an aside, using udcTimeout=1 forcees the browser to reload the hub pretty much every time you refresh the page on the Genome Browser. That is a rather excessive amount of refresh, and we recommend that you only use it for testing and remove it as soon as you're done. Otherwise it generates a significant amount of additional traffic between our servers.

It looks like you're already aware of at least some of this, but I wanted to make sure we're on the same page. If you are still having issues, my next thought would be that something in your cart might be interfering. You might try going to the gateway page at http://genome.ucsc.edu/cgi-bin/hgGateway and clicking the "cart reset" button. That should clear out any existing settings (including loaded hubs). Then you can click on the track hubs button and paste the URL to your hub to load it again fresh. Note that performing a cart reset will also clear the udcTimeout option, so you will have to set that again before loading your hub. You can also try visiting the public URL of your hub files in your web browser to confirm that your changes are visible to UCSC.

If you change the URL to your hub, the new URL is treated as a completely new and separate hub. That should prevent the contents of the old file from having any affect on the hub with the new URL. I would love to see this in action. Are you able to share the URL of one of these hubs with me? Or a saved session with the hub loaded?

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Jelena Telenius

unread,
Aug 20, 2015, 12:14:30 PM8/20/15
to Jonathan Casper, gen...@soe.ucsc.edu
Hi Jonathan,

thanks for getting back to us !

I will revisit all the things you mentioned, and check the new tricks you sent ! - and see if I can fix it for our hubs.

Very grateful

Jelena Telenius

p. +44 79759 23522 (British number)
p. +358 50 539 5225 (Finnish number)
Reply all
Reply to author
Forward
0 new messages