igvtools index error message

679 views
Skip to first unread message

Nicki

unread,
Apr 11, 2012, 12:05:36 PM4/11/12
to igv-help
Hello

I am trying to index a vcf file using igvtools from within IGV and am
getting an error:

"Error: The provided VCF file is malformed at line number 40:
Unparsable vcf record with allele M "

Here is the top of the VCF file, including line 40:

##fileformat=VCFv4.0
##fileDate=2012-01-30
##source=Platypus_Version_0.1.5
##INFO=<ID=func,Number=1,Type=String,Description="Functional category:
exnoic, intergenic etc">
##INFO=<ID=gene,Number=1,Type=String,Description="RefSeq gene name">
##INFO=<ID=exon_func,Number=1,Type=String,Description="Exonic function
of the variant">
##INFO=<ID=AAchange,Number=1,Type=String,Description="Amino Acid
change">
##INFO=<ID=cons46,Number=1,Type=String,Description="UCSC 46 species
conservation score">
##INFO=<ID=segdup,Number=1,Type=Float,Description="UCSC segment
duplication score">
##INFO=<ID=1000g,Number=1,Type=Float,Description="1000 genomes allelic
frequency">
##INFO=<ID=dbsnp,Number=1,Type=String,Description="dbSNP ID">
##INFO=<ID=sift,Number=1,Type=Float,Description="SIFT score">
##INFO=<ID=pp2,Number=1,Type=Float,Description="PolyPhen2 score">
##INFO=<ID=phylop,Number=1,Type=Float,Description="Phylop score">
##INFO=<ID=mutT,Number=1,Type=Float,Description="Mutation Taster
score">
##INFO=<ID=LRT,Number=1,Type=Float,Description="LRT score">
##INFO=<ID=FR,Number=0,Type=Float,Description="Estimated population
frequency">
##INFO=<ID=RPV,Number=0,Type=Float,Description="Median minimum base
quality for bases around variant">
##INFO=<ID=RPV,Number=0,Type=Float,Description="Reverse strand p-
value">
##INFO=<ID=TCR,Number=0,Type=Integer,Description="Total reverse strand
coverage at this locus">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Homopolmer run
length">
##INFO=<ID=ABPV,Number=0,Type=Float,Description="Allele-bias p-value.
Testing for low variant coverage">
##INFO=<ID=TR,Number=0,Type=Integer,Description="Total number of reads
containing this variant">
##INFO=<ID=PP,Number=0,Type=Float,Description="Posterior probability
(phred scaled) that this variant segregates">
##INFO=<ID=NF,Number=0,Type=Integer,Description="Total number of
forward reads containing this variant">
##INFO=<ID=SC,Number=1,Type=String,Description="Genomic sequence 10
bases either side of variant position">
##INFO=<ID=FPV,Number=0,Type=Float,Description="Forward strand p-
value">
##INFO=<ID=TCF,Number=0,Type=Integer,Description="Total forward strand
coverage at this locus">
##INFO=<ID=NR,Number=0,Type=Integer,Description="Total number of
reverse reads containing this variant">
##INFO=<ID=RMP,Number=0,Type=Float,Description="RMS Position in reads
of Variant">
##INFO=<ID=TC,Number=0,Type=Integer,Description="Total coverage at
this locus">
##FILTER=<ID=sb,Description="Variant fails strand-bias filter">
##FILTER=<ID=ab,Description="Variant fails allele-bias filter">
##FILTER=<ID=badReads,Description="Variant supported only by reads
with low quality bases close to variant position, and not present on
both strands.">
##FILTER=<ID=hp10,Description="Flanking sequence contains homopolymer
of length 10 or greater">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype log-
likelihoods (log10) for AA,AB and BB genotypes, where A = ref and B =
variant. Only applicable for bi-allelic sites">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=NR,Number=1,Type=Integer,Description="Number of reads
covering variant in this sample">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality,
as Phred score">
#CHROM POS ID REF ALT QUAL FILTER INFO
FORMAT AW_SC_4654.bam
1 10146 . AC A 118 PASS
ABPV=4.72e-01;FPV=3.72e-09;FR=0.5003;HP=4;NF=6;NR=2;PP=118;RMP=65.56;RPV=3.43e-01;SC=CCTAACCCTAACCCCTAACCC;TC=31;TCF=25;TCR=6;TR=8;func=intergenic;gene=NONE(dist=NONE)
WASH7P(dist=4215);segdup=0.99 GT:GL:GQ:NR
0/1:-182.49,-146.41,-145.22:32:32
1 12783 . G A 110 PASS ABPV=1.00e
+00;FPV=1.00e
+00;FR=1.0000;HP=1;MMLQ=33;NF=6;NR=0;PP=110;RMP=35.03;RPV=1.00e
+00;SC=CGGGGCCGGCGTCTCCTGTCT;TC=6;TCF=6;TCR=0;TR=6;func=intergenic;gene=NONE(dist=NONE)
WASH7P(dist=1579);segdup=0.99;1000g=0.58;dbsnp=rs62635284
GT:GL:GQ:NR 1/1:-32.34,-4.13,0.0:50:6
1 14464 . A T 154 PASS ABPV=1.00e
+00;FPV=1.00e
+00;FR=1.0000;HP=1;MMLQ=35;NF=10;NR=0;PP=154;RMP=51.55;RPV=1.00e
+00;SC=TTAAGAACACAGTGGCGCAGG;TC=10;TCF=10;TCR=0;TR=10;func=ncRNA_exonic;gene=WASH7P;segdup=0.99;1000g=0.17
GT:GL:GQ:NR 1/1:-52.06,-15.75,-9.98:72:10
1 14930 . A G 64 PASS ABPV=1.00e
+00;FPV=8.59e-01;FR=0.5000;HP=1;MMLQ=33;NF=18;NR=8;PP=64;RMP=63.67;RPV=1.47e-03;SC=ACAGAATTACAAGGTGCTGGC;TC=65;TCF=31;TCR=34;TR=26;func=ncRNA_intronic;gene=WASH7P;segdup=0.99;1000g=0.50;dbsnp=rs6682385
GT:GL:GQ:NR 1/0:-220.74,-198.5,-398.09:99:65
1 15118 . A G 28 PASS ABPV=1.00e
+00;FPV=1.00e
+00;FR=0.5000;HP=2;MMLQ=30;NF=0;NR=7;PP=28;RMP=45.42;RPV=3.15e-01;SC=CCCCCATGACACTCCCCAGCC;TC=17;TCF=0;TCR=17;TR=7;func=ncRNA_intronic;gene=WASH7P;segdup=0.99;1000g=0.35;dbsnp=rs11580262
GT:GL:GQ:NR 0/1:-39.1,-25.04,-63.71:64:17
1 15211 . T G 200 PASS ABPV=1.00e
+00;FPV=1.00e
+00;FR=0.5000;HP=1;MMLQ=32;NF=13;NR=18;PP=200;RMP=56.34;RPV=1.00e
+00;SC=AGACAGCGGCTGTTTGAGGAG;TC=35;TCF=13;TCR=22;TR=31;func=ncRNA_intronic;gene=WASH7P;segdup=0.99;1000g=0.63;dbsnp=rs11586607
GT:GL:GQ:NR 1/0:-209.81,-120.44,-129.85:43:35

How do I fix this?

Thanks, Nicki
MRC Molecular Haematology Unit

Jim Robinson

unread,
Apr 11, 2012, 1:14:19 PM4/11/12
to igv-...@googlegroups.com
Hi Niki,

Could you try this with the pre-release igvtools 2.0, which you can download from here

http://www.broadinstitute.org/igv/projects/downloads/igvtools_test.zip

or

http://www.broadinstitute.org/igv/projects/downloads/igvtools_nogenomes_test.zip

The second option is much smaller, but you will need to move or copy your "genomes" folder from the previous installation.

Thanks

Jim

Nicki Gray

unread,
Apr 13, 2012, 6:59:18 AM4/13/12
to igv-...@googlegroups.com
Hi Jim

Thanks for your reply. 
I noticed that in the VCF file I have been given there was whitespace in the INFO field which I have now replaced with an underscore but I am still getting the same error message.


I downloaded igvtools 2.0 as suggested and using this I get the following:

"The provided VCF file is malformed at approximately line number 2412715: Unparsable vcf record with allele M"

Lines 2412714-2412716 are below:


3 60830521 . C T 200 PASS ABPV=1.00e+00;FPV=1.90e-01;FR=1.0000;HP=2;MMLQ=33;NF=20;NR=22;PP=200;RMP=61.25;RPV=1.00e+00;SC=TCTTCATTAGCGCTACATAGC;TC=43;TCF=21;TCR=22;TR=42;func=intronic;gene=FHIT;1000g=0.90;dbsnp=rs1900668 GT:GL:GQ:NR 1/1:-561.04,-478.72,-466.88:99:43
3 60830534 . M C 200 PASS ABPV=1.00e+00;FPV=1.00e+00;FR=1.0000;HP=1;MMLQ=34;NF=21;NR=21;PP=200;RMP=61.16;RPV=1.00e+00;SC=TACATAGCTGMCTTATTATTC;TC=42;TCF=21;TCR=21;TR=42 GT:GL:GQ:NR 1/1:-579.73,-519.38,-507.47:100:42
3 60830545 . G T 116 PASS ABPV=1.00e+00;FPV=1.00e+00;FR=1.0000;HP=1;MMLQ=32;NF=17;NR=17;PP=116;RMP=59.86;RPV=1.00e+00;SC=CTTATTATTCGTGGTCCCCTA;TC=34;TCF=17;TCR=17;TR=34;func=intronic;gene=FHIT;1000g=0.89;dbsnp=rs2594129 GT:GL:GQ:NR 1/1:-469.5,-442.17,-435.84:79:34


Thanks, Nicki

---------------------
Nicki Gray
MRC Molecular Haematology Unit
01865 222434

Jim Robinson

unread,
Apr 14, 2012, 11:17:42 PM4/14/12
to igv-...@googlegroups.com
Hi, apologies for the delayed response.  I needed to confirm with the "GATK" team who developed this code.  This is their response:

 One of the alleles is "M" which isn't allowed in the spec (it can only be A,C,G,T, or N).

Nicki Gray

unread,
Apr 17, 2012, 6:36:04 AM4/17/12
to igv-...@googlegroups.com
Hi 

thanks for your email.

In the VCF files I have been given there are IPAUC codes - so the M is A or C
Can IGV tools not handle IPAUC codes?  Is the only way to get IGVtools to work for for me to write a script to convert IUPAC codes to the bases?

Thanks, Nicki

---------------------
Nicki Gray
MRC Molecular Haematology Unit
01865 222434

James Robinson

unread,
Apr 17, 2012, 8:59:22 AM4/17/12
to igv-...@googlegroups.com
Hi Nicki,

The VCF specification does not allow IPAUC codes.  What tool created this file?  I think the best forum for this is one of the VCF mailing lists, or the GATK forum  (the VCF reader is part of the GATK).

Converting them to a base or "N" is a workaround.

-- Jim

Nicki Gray

unread,
Apr 17, 2012, 9:19:34 AM4/17/12
to igv-...@googlegroups.com
Hi Jim

This file is was created using  Platypus_Version_0.1.5 (an unpublished program written at WTCHG).

WTCHG do use IGV on files from Platypus and so must have come up against this problem - I'll see if I can contact someone there and find out what they do.

Thanks for your help,
Nicki

James Robinson

unread,
Apr 17, 2012, 9:32:41 AM4/17/12
to igv-...@googlegroups.com
Nicki,  that's not to say there aren't tools that can use this.  We use a strict VCF reader with respect to the spec.   Again the VCF or GATK forums are the place to go,  I'm not the expert on this.

Reply all
Reply to author
Forward
0 new messages