Re: Simple changes to support samtools mpileup v. 1.3 VCF output

112 views
Skip to first unread message

Matthew Speir

unread,
Sep 2, 2016, 6:03:32 PM9/2/16
to open.g...@gmail.com, gen...@soe.ucsc.edu
Hi Ted,

Thank you for your question about VCF 4.2 support in the UCSC Genome
Browser.

I apologize for the delay in responding to your question. Thank you for
providing detailed information about how this format has changed. We are
looking into support for this new version of the VCF format and hope to
implement it into a future release of our software. However, we can't
provide an estimate of when that may be at this time.

If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible
Google Groups forum. If your question includes sensitive data, you may
send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

---

> Hi,
>
> samtools mpileup 1.2 and 1.3 produce VCF 4.2 compatible files.
>
> It's very difficult to "downgrade" VCFs for display in the UCSC Browser.
>
> There are a few simple fixes to make VCF 4.2 files compatible with the
> UCSC Browser:
>
> 1. The "unknown ALT" allele when a base matches the Reference Sequence
> is "<X>".
> The UCSC Browser cannot handle "mult-character alleles" but in fact
> this is equivalent to "." (a "missing value"). This could just be
> treated as ".", (instead of using sed and awk to modify the files to
> change all instances of "<X>" to ".").
>
> These can also be removed by piping through "bcftools call -m" This
> could be done as part of the upload process if a gVCF-type file is
> detected.
>
> The ##ALT definition in the header:
> ##ALT=<ID=X,Description="Represents allele(s) other than observed.">
>
> An example in a VCF variant row:
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT WC1
> 1 565596 rs9782892 G <X> 0 .
> DP=11;I16=5,6,0,0,433,17101,0,0,407,15059,0,0,164,3524,0,0;QS=1,0;MQSB=0.950952;MQ0F=0;DPR=11,0
> PL:DP:DV:DPR 0,33,255:11:0:11,0
>
> 2. ##INFO and ##FORMAT lines allow characters such as ( and ) that do
> not match the current format regex. The browser throws the following
> error:
>
> Error :
> http://www.open-genomes.org/genomes/Kilinc%20(2016)/Bon002/Bon002_VCF_4.2_filtered_by_All_SNPs_dbSNP-147_with_plus_orientation_rsIds_alleles_type_filtered_for_single_and_exceptions_with_AX-ids-annnotated.vcf.gz:-1:
> ##INFO line does not match expected pattern /^##(INFO|FORMAT)=$/ or
> /^##(INFO|FORMAT)=([A-Za-z0-9_:-]+),(\.|A|G|[0-9-]+),([A-Za-z]+),"?(.*)"?$/:
> "##INFO="
> http://www.open-genomes.org/genomes/Kilinc%20(2016)/Bon002/Bon002_VCF_4.2_filtered_by_All_SNPs_dbSNP-147_with_plus_orientation_rsIds_alleles_type_filtered_for_single_and_exceptions_with_AX-ids-annnotated.vcf.gz:-1:
> ##FORMAT line does not match expected pattern /^##(INFO|FORMAT)=$/ or
> /^##(INFO|FORMAT)=([A-Za-z0-9_:-]+),(\.|A|G|[0-9-]+),([A-Za-z]+),"?(.*)"?$/:
> "##FORMAT="
> http://www.open-genomes.org/genomes/Kilinc%20(2016)/Bon002/Bon002_VCF_4.2_filtered_by_All_SNPs_dbSNP-147_with_plus_orientation_rsIds_alleles_type_filtered_for_single_and_exceptions_with_AX-ids-annnotated.vcf.gz:-1:
> ##INFO line does not match expected pattern /^##(INFO|FORMAT)=$/ or
> /^##(INFO|FORMAT)=([A-Za-z0-9_:-]+),(\.|A|G|[0-9-]+),([A-Za-z]+),"?(.*)"?$/:
> "##INFO="
>
> It would be simple enough to change the regex for the ##INFO and
> ##FORMAT lines ..
>
> I think with these two fixes, most VCF's that are the output of
> samtools mpileup will be ready to display directly in the browser.
>
> Example of the output of samtools mpileup:
> The header for from the ##ALT line downward, and a single variant line:
>
>
> ##ALT=<ID=X,Description="Represents allele(s) other than observed.">
> ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
> variant is an INDEL.">
> ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of
> reads supporting an indel">
> ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of
> reads supporting an indel">
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias
> for filtering splice-site artefacts in RNA-seq data (bigger is
> better)",Version="3">
> ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of
> Read Position Bias (bigger is better)">
> ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of
> Mapping Quality Bias (bigger is better)">
> ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of
> Base Quality Bias (bigger is better)">
> ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test
> of Mapping Quality vs Strand Bias (bigger is better)">
> ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based
> metric.">
> ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads
> (smaller is better)">
> ##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used
> for calling, see description of bcf_callret1_t in bam2bcf.h">
> ##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for
> calling">
> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
> Phred-scaled genotype likelihoods">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of
> high-quality bases">
> ##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of
> high-quality non-reference bases">
> ##FORMAT=<ID=DPR,Number=R,Type=Integer,Description="Number of
> high-quality bases observed for each allele">
> ##INFO=<ID=DPR,Number=R,Type=Integer,Description="Number of
> high-quality bases observed for each allele">
> ##bcftools_annotateVersion=1.3.1+htslib-1.3.1
> ##bcftools_annotateCommand=annotate -a
> /home/opengeno/opt/data/arrays/Orientation/All_SNPs_dbSNP-147_with_plus_orientation_rsIds_alleles_type_filtered_for_single_and_exceptions_with_AX-ids.bed.gz
> -c CHROM,FROM,TO,ID
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT WC1
> 1 565596 rs9782892 G <X> 0 .
> DP=11;I16=5,6,0,0,433,17101,0,0,407,15059,0,0,164,3524,0,0;QS=1,0;MQSB=0.950952;MQ0F=0;DPR=11,0
> PL:DP:DV:DPR 0,33,255:11:0:11,0
>
> Thanks!

Reply all
Reply to author
Forward
0 new messages