Issues with Variant Annotation Integrator

51 views
Skip to first unread message

Ciosek,Julia L

unread,
Jul 27, 2021, 3:05:11 PM7/27/21
to gen...@soe.ucsc.edu

Hello Team,

 

I am a new graduate student and am starting to with Custom tracks and the Variant Annotation Integrator. When I load my custom track, not all of it appears and when it does, the Variant Annotation Integrator is not using the enseml genes. It only appears to use the genescan genes.

 

This is what appears when I add the Custom tracks into the Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTracks?db=equCab3&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr10%3A67264486%2D67823540&hgsid=1134819847_ZdjnhA6pfco0eIwmiYnSRbKaWKa9).

 

And this is what appears when I try to use the VAI:

Thank you for all of your help,

Julia Ciosek

jci...@ufl.edu

Matthew Speir

unread,
Jul 28, 2021, 10:12:48 PM7/28/21
to Ciosek,Julia L, gen...@soe.ucsc.edu
Hi, Julia.

Thank you for your question about using the Variant Annotation Integrator (VAI).

Sorry to hear you're having issues with getting output from the VAI. I think all that's needed are some settings changes and you should see some results.

One of the main reasons you are not seeing any results at the moment for the Ensembl Genes track is because you unselected the boxes for " intergenic", "upstream/downstream" and "intronic" variants under the "Define Filters" section, but all of the variants in the default selected position (chr10:67,264,486-67,823,540) are variants of those types.  If you check the boxes to see variants of those types, you should see results for your UFL3267.vcf.gz track at that position. The Genscan Genes track has different transcript predictions from Ensembl Genes and some of your variants just happened to overlap with these Genscan transcripts in a way that resulted in output. 

If you want to see up to 100,000 variants genome-wide while excluding the intergenic, upstream/downstream, and intronic variants:
  1. Under the section "Select Genome Assembly and Region" select "genome" from the "region to annotate" drop-down menu
  2. Under "Select Variants", select "100,000" from the "maximum number of variants to be processed" drop-down menu
You should see some results that way. 

We also noticed that there are issues with the chromosome names in your BAM file (1700vEC3.bam) and VCF file (1700vEqCab3.vcf.gz) that are preventing them from displaying in the Genome Browser. They look to be NCBI accessions like 'NW_019646409.1' or 'NC_009144.3', however, for BAM and VCF we require 'chr1' or '1'. We're hoping to make this easier in the future, but for now, you will need to change these chromosome names in the files themselves. For the VCF file, this can be done using our chromToUcsc utility which will convert the chromosome names you currently have into those the UCSC Genome Browser recognizes. Here are few steps to do this conversion:

1. Get chromToUcsc from our download server: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/chromToUcsc
2. Go into the same directory and make it executable: 
chmod +x chromToUcsc

3. Use the utility to download a chromAlias file: 
chromToUcsc --get=equCab3

4. Then use that with the utility to convert the chromosome names in your VCF: 
chromToUcsc -a equCab3.chromAlias.tsv -i 1700vEqCab3.vcf -o 1700vEqCab3.ucsc.vcf

To fix the BAM file, you will need to convert it to SAM format, then use this command to convert it: 
chromToUcsc -a equCab3.chromAlias.tsv -i 1700vEC3.sam -o 1700vEC3.ucsc.sam -k 3

The "-k" option here tells chromToUcsc which column has the chromosome names.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Training videos & resources: http://genome.ucsc.edu/training/index.html

Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining

---

Matthew Speir

UCSC Cell Browser, Quality Assurance and Data Wrangler

Human Cell Atlas, User Experience Researcher

UCSC Genome Browser, User Support

UC Santa Cruz Genomics Institute

Revealing life’s code.



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BN6PR2201MB1425D134945B6B1C7627DEE9ADE99%40BN6PR2201MB1425.namprd22.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages