Uploading VCF tracks from a server

26 views
Skip to first unread message

Mohammed Al Abri

unread,
Aug 15, 2017, 2:49:31 PM8/15/17
to gen...@soe.ucsc.edu, Hu, Zhiliang [AN S], Brooks,Samantha
Dear UCSC team,

I am having trouble uploading a bigDataURL custom track from a particular server but able to do so from another one, both in the US. Are there possibly any restrictions on particular servers or locations?

For instance, when I upload the file at the following URL "https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/Annotations/GATK_annotated_SNPs_Indels.vcf.gz", ucsc isn't able to load it.

Note that I am still able to download the file!

 However, when I try uploading it from the following URL:
"https://bio.rc.ufl.edu/pub/brooks/six_genomes/Al_Abri_2016/Tracks/GATK_annotated_SNPs_Indels.vcf.gz" it loads just fine.

I am just wondering what maybe the problem.
I would really appreciate your help with this.

-Thank you

Brian Lee

unread,
Aug 15, 2017, 4:17:18 PM8/15/17
to Mohammed Al Abri, gen...@soe.ucsc.edu, Hu, Zhiliang [AN S], Brooks,Samantha

Dear Mohammed,

Thank you for using the UCSC Genome Browser and your question about loading a particular file as a custom track through a bigDataUrl.

For binary custom tracks (such as bigBed, bigWig, BAM, VCF, and other "big*" files) to display, the server that is hosting them must allow byte-range requests. This enables the browser to only request from the server hosting the file to send a small region of data across the internet, instead of sending the entire file (which can be quite sizable for some data). Some servers, especially free ones like DropBox, do not enable byte-range requests as it would allow people to stream pirated video content, which similarly requires byte-range requests for access. Another unique issue is that BAM and VCF files require an additional index file, such as bam.bai and vcf.gz.tbi to exist at the same location so the Browser can quickly find the section of the file where the data resides (see more here: http://genome.ucsc.edu/goldenPath/help/vcf.html).

One can check if a file supports byte-ranges by doing a "curl -I http://URL/to/file.vcf" and looking for information response like "Accept-Ranges: bytes." For the second URL you share this is the case.

For the first URL you share, instead the response is "HTTP/1.1 404 Not Found." This message suggests that the data is not even publicly available for download.

And in fact if you navigate to that page you get a 404 message:
https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/Annotations/GATK_annotated_SNPs_Indels.vcf.gz

From exploring that site, there is another location that might be equivalent, which indeed does support byte-range requests and has an index file, but that index file is named incorrectly (vcf.tbi.gz instead of vcf.gz.tbi): https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/

Upon some examination, and using the supplied files you shared it appears the tbi file was built incorrectly at the repository location. You can see this in the file size differences (as well as in the name of the file vcf.tbi.gz versus vcf.gz.tbi):

792318896 Jan 12 2016 Annotations_GATK_SNPs_Indels.vcf.gz
1687550 Jan 12 2016 Annotations_GATK_SNPs_Indels.vcf.tbi.gz
792318896 Jan 12 2016 GATK_annotated_SNPs_Indels.vcf.gz
1694600 Jan 12 2016 GATK_annotated_SNPs_Indels.vcf.gz.tbi

There is a unique feature the UCSC Browser has called "bigDataIndex" which allows one to point to a different location for an index file. It works perfectly in this case to illustrate swapping out the incorrectly built Annotations_GATK_SNPs_Indels.vcf.tbi.gz for the correctly built GATK_annotated_SNPs_Indels.vcf.gz.tbi file:

You can enter this text into the equCab2 custom track page (http://genome.ucsc.edu/cgi-bin/hgCustom?db=equCab2) and it should load:

track type=vcfTabix name="HorseVCF" description="Data from https://www.animalgenome.org/BUT index from https://bio.rc.ufl.edu" bigDataUrl=https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/Annotations_GATK_SNPs_Indels.vcf.gz bigDataIndex=https://bio.rc.ufl.edu/pub/brooks/six_genomes/Al_Abri_2016/Tracks/GATK_annotated_SNPs_Indels.vcf.gz.tbi

Would you please contact the https://www.animalgenome.org/ group and suggest they rebuild there tabix file, they could apparently swap in the one at GATK_annotated_SNPs_Indels.vcf.gz.tbi provided they rename it to "Annotations_GATK_SNPs_Indels.vcf.gz.tbi" to match the file name.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAH%2BhoyywsZOgTnCRBdRp2JS7KrzrpvqX5DVXGMsV6RSFXqUSXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Hu, Zhiliang [AN S]

unread,
Aug 15, 2017, 5:21:49 PM8/15/17
to Mohammed Al Abri, gen...@soe.ucsc.edu, Brooks,Samantha
Mohammed,

The file you referred: "https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/Annotations/GATK_annotated_SNPs_Indels.vcf.gz" does not exist on this site.  Please see "https://www.animalgenome.org/repository/horse/6_horse-breeds_variants/" for available files you looked for.

Perhaps that's a file name before you re-organized/re-named?

Zhiliang
Helpdesk | AnGenMap | Animal QTLdb | CorrDB | Data Repository |
NAGRP Databases and Web Sites | USDA-NIFA-NRSP-8 Bioinformatics
ADDR: https://www.AnimalGenome.ORG/



Reply all
Reply to author
Forward
0 new messages