UCSC trackhub search error

Vaneet Lotay

unread,

Apr 20, 2018, 1:11:13 PM4/20/18

to gen...@soe.ucsc.edu

Hello,

I’m having problems setting up the gene search function for the Xenopus Laevis 9.2 genome I’ve added to my UCSC track hub:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47641_xl9_2&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr3L%3A118176802-118205069&hgsid=665156243_2m0iXSiyp9wzrYvtnx1OWoe4qJvO

The search will return results for genes which I have in my original BED file but located on the wrong chromosome or scaffold which is why it results in this error:

Window out of range on Scaffold1117

I know why this error usually occurs, as I’ve had it in the past, sent the issue to you guys and you’ve stated that it is because when I create the bigBed file my chrom.sizes file was not sorted properly according to my original BED file. However this time I have sorted my BED file and chrom.sizes file to be in sync to the point where if I run a ‘diff’ command between the unique chromomsomes in my BED file and the chromosomes column in my chrom.sizes file, there is no difference which means they should be identical values and in the same order.

So I don’t understand why this error still happens. When I make sure everything’s sorted properly in Tropicalis with the same process, everything works fine with the search. I can send you my files privately once I get a response to this e-mail if you like to have a look. I’ve been using the ‘searchIndex name’ and ‘searchTrix ixFile’ parameters in the trackDb for the gene search. I used this command to create the bigBed file:

bedToBigBed -type=bed12 -extraIndex=name bedFile chrom.sizes out.bb

Also the Xenopus BED file I’m using is originally from the main UCSC genome browser website.

The only thing that’s worked in the past is when you guys have produced the bigBed file for me and I’ve loaded in to my trackhub and it’s worked….we might need to do that again but I would love to find the original source of the problem too possibly.

Thanks,

Vaneet

Vaneet Lotay

Xenbase Bioinformatician

716 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

Brian Lee

unread,

Apr 24, 2018, 1:35:12 AM4/24/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Dear Vaneet,

Thank you for using the UCSC Genome Browser and making Track Hubs for your assemblies and your question about the missing searches in your files.

It looks like a similar issue is occurring that happened in a previous correspondence, https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/sXu4x679eQ4/dVV5e1IoBgAJ, where the creation of an improper chrom.sizes files used to build the index for your bigBed is at issue.

I have made a temporary copy of your hub and rebuilt the bigBed after extracting the data to BED and can discover items correctly (such as bpi.S at Scaffold101:16971-37482) using a command such as the following (as noted in the earlier correspondence):

bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name BED_file_of_gene_data chrom.sizes bigBed_file_of_gene_data.bigBed

To create the chrom.sizes file I used a command like:

XL9_2_GCF.2bit stdout | sort -k2nr > chrom.sizes

It seems you may have edited your chrom.sizes file and it may have had unexpected consequences on the index that was built in your bigBed.

For instance, there is a tool called "bigBedInfo" that can be given the "-chroms" options to list all chromosomes that have data in the bigBed and the sizes of those chromosomes.

If you use bigBedInfo on a bigBed built correctly with the above steps you will find that this value is returned for "Scaffold101 0 319059" that Scaffold101 is the first 0 value, while much lower "Scaffold4132 84 1401" is an entry giving Scaffold4132 the assignment of 84.

When I look at your bigBed with bigBedInfo with a command like the following:

bigBedInfo -chroms ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_2/XL9_2_UCSC.bb

It appears "Scaffold101 84 319059" is given the assignment of 84, this seems to explain why an item bpi.S, that should be on Scaffold101 with a BED entry like "Scaffold101 16970 37482 bpi.S ..." when searched for is found at Scaffold4132:16971-37482. When these improper scaffold coordinates are clicked-on from a search of bpi.S the error message makes some sense. While the coordinates are correct, 16971-37482, the scaffold is incorrect, it should be Scaffold101:16971-37482 for finding bpi.S, and in fact Scaffold4132 is much shorter than Scaffold101 so that the incorrect coupling of Scaffold4132:16971-37482 is off the end of the Scaffold4132 (as you correctly interpreted that error message, but it is indeed confusing in this situation).

Please try rebuilding your bigBed with a new chrom.sizes built from your 2bit file, it should fix the problem (I've taken the steps to repeat building it here and shown it should work and can share a link to a working hub if you come to more problems).

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions and reply to gen...@soe.ucsc.edu messages will be archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee

UC Santa Cruz Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/YTOPR0101MB176900747D072B1CD52C60F28CB40%40YTOPR0101MB1769.CANPRD01.PROD.OUTLOOK.COM.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Brian Lee

unread,

Apr 26, 2018, 2:21:16 PM4/26/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Dear Vaneet,

Thank you again for sharing your experiences. We believe we discovered the root issue was likely that you had an older bedToBigBed utility, where newer versions are enhanced so that if one is trying to build a searchIndex, there is requirement to check that the input BED file is properly sorted in a case-sensitive manner (uppercase chromosomes are sorted before lowercase chromosomes).

We added some notes to our documentation about this need for the input BED data to be sorted this way when using searchIndex (sort -k1,1 -k2,2n), and also how the newer bedToBigBed will catch improper input and not build the index if isn't sorted in the expected manner.

All the best,

Brian

Vaneet Lotay

unread,

Apr 26, 2018, 3:15:24 PM4/26/18

to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

That’s great that you’ve finally found a root issue that may be causing these results. I’m glad I could help in that regard. I just want to clarify what you’re saying.

So you’re saying the new version of the bedToBigBed utility requires that input BED must be sorted so that uppercase chromosomes are sorted before lowercase chromosomes….or it doesn’t require this?

When you say you added notes to the documentation, where is this documentation and how can I access it?

Thanks,

Vaneet

Vaneet Lotay

Xenbase Bioinformatician

716 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

Brian Lee

unread,

Apr 26, 2018, 4:42:08 PM4/26/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Dear Vaneet,

It has always been the case that for searchIndex to work the input BED file should be sorted (sort -k1,1 -k2,2n unsorted.bed > sorted.bed). In the BED file you shared it was not sorted in this fashion, although you had it named as though it were fully sorted: XL9_2_UCSC_sorted.bed

Your file started with chr1L when it should have started with the capitalized chromosomes (Scaffolds) first:

head XL9_2_UCSC_sorted.bed | head -n 1 | cut -f 1
chr1L

By properly sorting your bed, the start of the file would be Scaffold101 instead:

sort -k1,1 -k2,2n XL9_2_UCSC_sorted.bed > XL9_2_UCSC_sorted_caseSensitive.bed
head XL9_2_UCSC_sorted_caseSensitive.bed | head -n 1 | cut -f 1
Scaffold101

The reference about the current version of bedToBigBed is that it will catch this issue.

For example with your file an error will display:

$ bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name XL9_2_UCSC_sorted.bed chrom.sizes output.bigBed
XL9_2_UCSC_sorted.bed is not case-sensitive sorted at line 10282. Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C, or bedSort and try again.

Versus an input that was correctly made (XL9_2_UCSC_sorted_caseSensitive.bed ):

$ bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name XL9_2_UCSC_sorted_caseSensitive.bed chrom.sizes output.bigBed
pass1 - making usageList (161 chroms): 24 millis
pass2 - checking and writing primary data (10976 records, 12 fields): 180 millis
Sorting and writing extra index 0: 6 millis

The documentation is in the example steps around searchIndex and also in the Track Database Definitions page:
http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#searchIndex

The searchIndex setting requires the input BED data to be case-senstive sorted (sort -k1,1 -k2,2n), where newer versions of the tool bedToBigBed (available here) are enhanced to catch improper input.

So in essence, as long as one is using the latest version of bedToBigBed one will be stopped from being able to build a searchIndex incorrectly because improperly sorted input will be caught.

All the best,
Brian

Vaneet Lotay

unread,

Apr 26, 2018, 4:50:40 PM4/26/18

to Brian Lee, gen...@soe.ucsc.edu

Thanks, this is all very helpful!

Vaneet Lotay

unread,

May 24, 2018, 2:26:16 PM5/24/18

to Brian Lee, gen...@soe.ucsc.edu

Hello Brian,

I’m having trouble loading 2 bigwig tracks on my UCSC trackhub. They say ‘No data’ as well as one of the tracks producing this message:

ftp server response timed out > 1000000 microsec

Can’t get data socket for URLtoBigWigfile

I’m wondering why it won’t load since I’m able to load these 2 bigwigs in another browser without issue. I even re-downloaded the files from their source and couldn’t find anything corrupt with the files themselves. I’ve sometimes seen this message before and believe it might be due to not enough memory being reserved for the Track hub or that I need to change the load time timeout because of the size of the files. One bigwig file is 850 MB and the other is approx. 1.5GB, which is why I think it could be an issue related to that. If it’s a configuration setting that needs to be changed or maximized, please let me know.

Thanks,

Vaneet

Vaneet Lotay

Xenbase Bioinformatician

716 ICT Building - University of Calgary

2500 University Drive NW

Calgary AB T2N 1N4

CANADA

From: Vaneet Lotay
Sent: Thursday, April 26, 2018 2:51 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>
Cc: gen...@soe.ucsc.edu
Subject: RE: [genome] UCSC trackhub search error

Thanks, this is all very helpful!

Vaneet

Brian Lee

unread,

Jun 1, 2018, 5:43:32 PM6/1/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Hi Vaneet,

I apologize for the delay in responding. It looks as though perhaps something has changed regarding where your files are being hosted. It may be worth connecting with your system administrators to discover if there has been a change. For these files to work byte-range requests must be allowed where the data can be extracted from the file at any location via an index.

Trying to load your previously shared assembly hub (ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt) is providing a similar error:

ftp server error on cmd=[RETR sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_2/XL9_2_GCF.2bit
    ] response=[150 Opening BINARY mode data connection for sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_2/XL9_2_GCF.2bit (793035136 bytes).
    ]
    Can't get data socket for ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xl9_2/XL9_2_GCF.2bit

Since this hub worked in the past and it required byte-range requests it seems most likely that perhaps something has changed at your server. Perhaps you could share more about how the bigwigs are displaying in another browser without issue? Is that a browser perhaps within your institution?

The best course of action would be to contact your system administrators and ask if they have implemented some kind of changes that are now preventing byte-range requests.

All the best,
Brian

Brian Lee

unread,

Jun 1, 2018, 6:07:00 PM6/1/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Hi Vaneet,

If you are contacting your system administrators you should also consider moving from FTP to HTTP to have faster connections for these requests as hosting hub files on HTTP tends to work even better than FTP for track hubs.

Brian

Vaneet Lotay

unread,

Jun 1, 2018, 6:09:34 PM6/1/18

to Brian Lee, gen...@soe.ucsc.edu

Hey Brian,

I’ve tried loading my trackhub from my desktop and on a separate laptop and they load fine with no errors.

Can you please try it again? Perhaps on a different browser or after clearing cache.

I meant that I’ve loaded these 2 bigwigs on the JBrowse browser and they load fine, meaning the files don’t appear corrupt. I also re-downloaded them from the source.

Brian Lee

unread,

Jun 4, 2018, 6:21:31 PM6/4/18

to Vaneet Lotay, gen...@soe.ucsc.edu

Hi Vaneet,

It looks like there might be some restrictions on your server for files that are over a certain size. You should contact your system administrators with some examples of files that are not allowing connections:

These two files over 1.3GB seem to give errors on the tool that would extract data from the files (bigBedToBed and bigWigToWig):

bigBedToBed ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xt9_1/TFBS_all_v1_1.bb stdout | head


ftp server response timed out > 1000000 microsec

Can't get data socket for ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xt9_1/TFBS_all_v1_1.bb

bigWigToWig ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xt9_1/species_cons/xenTro9_phastCons11way.bw stdout | head


ftp server response timed out > 1000000 microsec

Can't get data socket for ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xt9_1/species_cons/xenTro9_phastCons11way.bw

It appears smaller files do not have this restriction and can output information:

bigWigToWig ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/xt9_1/species_cons/xenTro9_phyloP11way.bw stdout | head 
fixedStep chrom=chr1 start=4156 step=1 span=1
0.685
... 
data from the file extracted
...

These tools are available here in case you wanted to share this with your system administrators.

bigWigToWig http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigToWig
bigBedToBed http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed

Another change that would greatly benefit your hub and files is switching from hosting them at ftp to rather http. Our engineer shares all you would have to do is add a symlink from your http directory to your ftp directory to make it work. Another engineer suspects that switching to http may resolve this apparent size restriction error.

Brian

Reply all

Reply to author

Forward