VCF csi indexing

265 views
Skip to first unread message

Udi Landau

unread,
Oct 11, 2021, 1:55:02 PM10/11/21
to gen...@soe.ucsc.edu
Hi,

I am using GBIB with plant genome.
Because of that I cannot index my vcf file using .bai (long chromosomes) so it is csi indexed.
Can I upload a vcf as a track to the genome browser?
Thank you for your help.

Udi

Luis Nassar

unread,
Oct 14, 2021, 5:50:53 PM10/14/21
to Udi Landau, gen...@soe.ucsc.edu
Hello, Udi.

Thank you for your interest in the Genome Browser.

Unfortunately, we do not support csi indexes. Which plant genome are you working with?

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/AM9PR02MB692920EBE39FE7782443CAEEB1B49%40AM9PR02MB6929.eurprd02.prod.outlook.com.

Udi Landau

unread,
Oct 18, 2021, 1:46:33 PM10/18/21
to Luis Nassar, gen...@soe.ucsc.edu
Hello Lou,

Thank you for your reply.

I am working on unpublished genome of Poaceae species.  
That is very unfortunate that I would not be able to use the VCF track, because much of our work is on genomic variations.

Udi

From: Luis Nassar <lrna...@ucsc.edu>
Sent: Friday, October 15, 2021 12:50 AM
To: Udi Landau <udi...@tauex.tau.ac.il>
Cc: gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] VCF csi indexing
 

Luis Nassar

unread,
Oct 19, 2021, 8:11:04 PM10/19/21
to Udi Landau, gen...@soe.ucsc.edu

Hello Udi,

Upon some investigation, you should be able to use csi indexes. Could you try to add the VCF file to your hub on your GBiB as a custom track (http://genome.ucsc.edu/cgi-bin/hgCustom) as such (note the bigDataIndex setting):

track type=vcfTabix name="vcfCSI" bigDataUrl=PATH/TO/VCF.GZ bigDataIndex=PATH/TO/VCF.GZ.CSI

Below is a working example on hg38 for reference:

track type=vcfTabix name="vcfCSI" bigDataUrl=https://hgwdev.gi.ucsc.edu/~lrnassar/ExampleCustomTracks/bamExample/NA12877.vcf.gz bigDataIndex=https://hgwdev.gi.ucsc.edu/~lrnassar/ExampleCustomTracks/bamExample/NA12877.vcf.gz.csi

Let us know if that works for you. If so, the same setting could be used to load the track as part of the hub.

Another option would also be to convert the VCF file to a bigBed (http://genome.ucsc.edu/FAQ/FAQformat.html#format1.5) to display the data. We have done this for some internal VCF tracks, such as dbSNP and gnomAD. Below is a link to the table schema for gnomAD as an example:

http://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=varRep&hgta_track=gnomadGenomesVariantsV3_1_1&hgta_table=gnomadGenomesVariantsV3_1_1&hgta_doSchema=describe+table+schema

You would need to reformat the file to at least have three required fields:

chrom
chromStart
chromEnd

You could then add any number of additional fields to the file which would be displayed in the track description page. It is worth noting that this would also allow you to use some settings available to bigBeds such as filters, mouseOvers, coloring, etc. I'll link to some more resources below:

http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html
http://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html

If you decide to employ this approach and require any assistance or have any further questions let us know.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Udi Landau

unread,
Oct 20, 2021, 11:48:23 AM10/20/21
to Luis Nassar, gen...@soe.ucsc.edu
Hello Lou,

Thank you very much for your reply.
I have already converted my VCF to BigBed and load it to the GBIB.
I can see the track on the browser but I have some problem configuring the bigbed file.
The bed file I use have the bed3 format (chr, start,end) with additional 4 fields ( '.' , score [float], ref allele, alt allele)
I used bedToBigBed with -type=bed3+4 to create the BigBed file.
I can see the bar graph of the track on my GBIB, however when I zoom in, I get instead of the bars the message: invalid signed integer: "item from the score column".
I also tried and failed to configure a 'labelFields', like so: labelFields <fieldName[,fieldName]> (taken from https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html#bigBed_-_Item_or_Region_Track_Settings)
 
It is not clear to me what to put in <fieldName[,fieldName]> 

I will also try to upload a VCF with csi index.

Thank you,
Udi

From: Luis Nassar <lrna...@ucsc.edu>
Sent: Wednesday, October 20, 2021 3:10 AM

Luis Nassar

unread,
Oct 21, 2021, 8:04:19 PM10/21/21
to Udi Landau, gen...@soe.ucsc.edu

Hi, Udi.

Glad to hear you were able to at least go the bigBed route. Let us know when you try the VCF with the csi index.

Regarding your issues with the bigBed file, there are a few suggestions. It can be hard to diagnose these problems without having access to the file itself.

Our first suggestion would be to try and rebuild the file but instead of type=3+4, use type=5+2. In essence your 4th and 5th field are standard BED fields, name (in this case arbitrary dots) and score. The reason for this is that very small BED files (BED 3 and BED 4) can sometimes have some unexpected interactions.

For labelFields, in the syntax:

labelFields <fieldName[,fieldName]>

The labelFields represents the display setting, the text inside the carrots (<>) represent the required settings, with the exception of anything inside brackets ([]) which is optional. In this case that means you need to designate at least one field name, but can optionally pass others.

For a working example, you can take a look at the transMapV5 track on hg38: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chrX&g=transMapEnsemblV5

You will see a long list of available labels:

Label: common name organism abbreviation source database ...

And these are designated with the following trackDb setting:

labelFields commonName,orgAbbrev,srcDb,srcTransId,name,geneName,geneId,geneType,transcriptType

Note that these are the names of the fields in the file schema:

https://genome.ucsc.edu/cgi-bin/hgTables?db=hg38&hgta_group=genes&hgta_track=transMapEnsemblV5&hgta_table=transMapEnsemblV5&hgta_doSchema=describe+table+schema

If you are still having issues with display or data, could you send us a snippet of the raw data, as well as all commands and trackDb settings you are using?

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Udi Landau

unread,
Oct 25, 2021, 1:57:11 PM10/25/21
to Luis Nassar, gen...@soe.ucsc.edu
Hello Lou,

I have upload the VCF file with the CSI index and it works! Thank you.

Regarding the problem I got with the BgiBed configuration I understand now that labelFields is not what I needed.
I will try to explain my need and I hope it will help you guide me to a solution. ( I tried to do it for the SNP data but it something I also need in other tracks).

I have made a gene track from gff3 file, The bigbed file was created following this: https://genome.ucsc.edu/goldenPath/help/hubQuickStartSearch.html
So, I have a gene track that display the position of the genes with the exons, strand, name etc.
What I would like to add is the description of the gene (I have it in the gff3) to the bed file so that when I click on a gene, I could see this information.
How can I do that? Do I need to create a new 'table schema'?

Appreciate your help,
Udi

 

From: Luis Nassar <lrna...@ucsc.edu>
Sent: Friday, October 22, 2021 3:03 AM

Daniel Schmelter

unread,
Oct 28, 2021, 8:10:15 PM10/28/21
to Udi Landau, Luis Nassar, gen...@soe.ucsc.edu

Hello Udi,

Thanks for your patience in this reply.

Additional information can be added to an item description page using the bedDetail format, described here:

https://genome.ucsc.edu/FAQ/FAQformat.html#format1.7

With this format, you can include up to two extra columns, in addition to the standard 4-12 columns, for extra information like a URL or description text. This should allow you to visualize it in your assembly hub in the bedDetail format or by converting to bigBed.

To convert it to bigBed binary format, you will want to use an autoSQL(.as) file that matches your number of columns. You may need to create a custom Table Schema if your file format does not have a pre-made file type name. If so, you can merge pieces of the bigBed12 and the bedDetail columns to create your custom schema.

https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bed12Source.as
https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/lib/bedDetail.as

I hope this was helpful. If you have any more questions, please reply-all to gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.

All the best,

Daniel Schmelter
UCSC Genome Browser


Udi Landau

unread,
Nov 1, 2021, 2:20:26 PM11/1/21
to Daniel Schmelter, Luis Nassar, gen...@soe.ucsc.edu
Hi Daniel and Lou,

That was exactly what I needed.
Thank you very much for the excellent product and great support!

Udi

From: Daniel Schmelter <dsch...@ucsc.edu>
Sent: Friday, October 29, 2021 3:10 AM
To: Udi Landau <udi...@tauex.tau.ac.il>
Cc: Luis Nassar <lrna...@ucsc.edu>; gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Reply all
Reply to author
Forward
0 new messages