Hello,
I’m having problems setting up the gene search function for the Xenopus Laevis 9.2 genome I’ve added to my UCSC track hub:
The search will return results for genes which I have in my original BED file but located on the wrong chromosome or scaffold which is why it results in this error:
I know why this error usually occurs, as I’ve had it in the past, sent the issue to you guys and you’ve stated that it is because when I create the bigBed file my chrom.sizes file was not sorted properly according to my original BED file. However this time I have sorted my BED file and chrom.sizes file to be in sync to the point where if I run a ‘diff’ command between the unique chromomsomes in my BED file and the chromosomes column in my chrom.sizes file, there is no difference which means they should be identical values and in the same order.
So I don’t understand why this error still happens. When I make sure everything’s sorted properly in Tropicalis with the same process, everything works fine with the search. I can send you my files privately once I get a response to this e-mail if you like to have a look. I’ve been using the ‘searchIndex name’ and ‘searchTrix ixFile’ parameters in the trackDb for the gene search. I used this command to create the bigBed file:
bedToBigBed -type=bed12 -extraIndex=name bedFile chrom.sizes out.bb
Also the Xenopus BED file I’m using is originally from the main UCSC genome browser website.
The only thing that’s worked in the past is when you guys have produced the bigBed file for me and I’ve loaded in to my trackhub and it’s worked….we might need to do that again but I would love to find the original source of the problem too possibly.
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
716 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name BED_file_of_gene_data chrom.sizes bigBed_file_of_gene_data.bigBed
XL9_2_GCF.2bit stdout | sort -k2nr > chrom.sizes
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/YTOPR0101MB176900747D072B1CD52C60F28CB40%40YTOPR0101MB1769.CANPRD01.PROD.OUTLOOK.COM.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Hey Brian,
That’s great that you’ve finally found a root issue that may be causing these results. I’m glad I could help in that regard. I just want to clarify what you’re saying.
So you’re saying the new version of the bedToBigBed utility requires that input BED must be sorted so that uppercase chromosomes are sorted before lowercase chromosomes….or it doesn’t require this?
When you say you added notes to the documentation, where is this documentation and how can I access it?
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
716 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
Dear Vaneet,
It has always been the case that for searchIndex to work the input BED file should be sorted (sort -k1,1 -k2,2n unsorted.bed > sorted.bed). In the BED file you shared it was not sorted in this fashion, although you had it named as though it were fully sorted: XL9_2_UCSC_sorted.bed
Your file started with chr1L when it should have started with the capitalized chromosomes (Scaffolds) first:
head XL9_2_UCSC_sorted.bed | head -n 1 | cut -f 1
chr1L
By properly sorting your bed, the start of the file would be Scaffold101 instead:
sort -k1,1 -k2,2n XL9_2_UCSC_sorted.bed > XL9_2_UCSC_sorted_caseSensitive.bed
head XL9_2_UCSC_sorted_caseSensitive.bed | head -n 1 | cut -f 1
Scaffold101
The reference about the current version of bedToBigBed is that it will catch this issue.
For example with your file an error will display:
$ bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name XL9_2_UCSC_sorted.bed chrom.sizes output.bigBed
XL9_2_UCSC_sorted.bed is not case-sensitive sorted at line 10282. Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C, or bedSort and try again.
Versus an input that was correctly made (XL9_2_UCSC_sorted_caseSensitive.bed ):
$ bedToBigBed -as=bed12.as -type=bed12 -extraIndex=name XL9_2_UCSC_sorted_caseSensitive.bed chrom.sizes output.bigBed
pass1 - making usageList (161 chroms): 24 millis
pass2 - checking and writing primary data (10976 records, 12 fields): 180 millis
Sorting and writing extra index 0: 6 millis
The documentation is in the example steps around searchIndex and also in the Track Database Definitions page:
http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#searchIndex
The searchIndex setting requires the input BED data to be case-senstive sorted (sort -k1,1 -k2,2n), where newer versions of the tool bedToBigBed (available here) are enhanced to catch improper input.
So in essence, as long as one is using the latest version of bedToBigBed one will be stopped from being able to build a searchIndex incorrectly because improperly sorted input will be caught.
All the best,
Brian
Thanks, this is all very helpful!
Hello Brian,
I’m having trouble loading 2 bigwig tracks on my UCSC trackhub. They say ‘No data’ as well as one of the tracks producing this message:
ftp server response timed out > 1000000 microsec
Can’t get data socket for URLtoBigWigfile
I’m wondering why it won’t load since I’m able to load these 2 bigwigs in another browser without issue. I even re-downloaded the files from their source and couldn’t find anything corrupt with the files themselves. I’ve sometimes seen this message before and believe it might be due to not enough memory being reserved for the Track hub or that I need to change the load time timeout because of the size of the files. One bigwig file is 850 MB and the other is approx. 1.5GB, which is why I think it could be an issue related to that. If it’s a configuration setting that needs to be changed or maximized, please let me know.
Thanks,
Vaneet
Vaneet Lotay
Xenbase Bioinformatician
716 ICT Building - University of Calgary
2500 University Drive NW
Calgary AB T2N 1N4
CANADA
From: Vaneet Lotay
Sent: Thursday, April 26, 2018 2:51 PM
To: 'Brian Lee' <bria...@soe.ucsc.edu>
Cc: gen...@soe.ucsc.edu
Subject: RE: [genome] UCSC trackhub search error
Thanks, this is all very helpful!
Vaneet
Hi Vaneet,
I apologize for the delay in responding. It looks as though perhaps something has changed regarding where your files are being hosted. It may be worth connecting with your system administrators to discover if there has been a change. For these files to work byte-range requests must be allowed where the data can be extracted from the file at any location via an index.
Trying to load your previously shared assembly hub (ftp://xenbaseturbofrog.org/sequence_information/RIMLS-SVH/ChIP-seq_Feb2015/hub.txt) is providing a similar error:
Since this hub worked in the past and it required byte-range requests it seems most likely that perhaps something has changed at your server. Perhaps you could share more about how the bigwigs are displaying in another browser without issue? Is that a browser perhaps within your institution?
The best course of action would be to contact your system administrators and ask if they have implemented some kind of changes that are now preventing byte-range requests.
All the best,
Brian
Hey Brian,
I’ve tried loading my trackhub from my desktop and on a separate laptop and they load fine with no errors.
Can you please try it again? Perhaps on a different browser or after clearing cache.
I meant that I’ve loaded these 2 bigwigs on the JBrowse browser and they load fine, meaning the files don’t appear corrupt. I also re-downloaded them from the source.
Hi Vaneet,
It looks like there might be some restrictions on your server for files that are over a certain size. You should contact your system administrators with some examples of files that are not allowing connections:
These two files over 1.3GB seem to give errors on the tool that would extract data from the files (bigBedToBed and bigWigToWig):
It appears smaller files do not have this restriction and can output information:
These tools are available here in case you wanted to share this with your system administrators.
bigWigToWig http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigToWig
bigBedToBed http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigBedToBed
Another change that would greatly benefit your hub and files is switching from hosting them at ftp to rather http. Our engineer shares all you would have to do is add a symlink from your http directory to your ftp directory to make it work. Another engineer suspects that switching to http may resolve this apparent size restriction error.
Brian