Exception on loading VCF file

314 views
Skip to first unread message

Leon Avery

unread,
Jan 27, 2013, 9:33:40 AM1/27/13
to igv-...@googlegroups.com
When I try to load the attached VCF file into IGV, I get the following error. "About" shows: IGV 2.2.4 (6) 01/14/2013 03:06 PM. This is MacOS 10.7.5.
INFO [2013-01-27 09:12:26,736]  [IGV.java:1452] [pool-1-thread-3]  Loading 1 resources.
WARN [2013-01-27 09:12:26,793]  [VCFStandardHeaderLines.java:218] [pool-1-thread-3]  Repairing standard header line for field MQ because -- type disagree; header has Integer but standard is Float -- descriptions disagree; header has 'Root-mean-square mapping quality of covering reads' but standard is 'RMS Mapping Quality'
WARN [2013-01-27 09:12:26,793]  [VCFStandardHeaderLines.java:218] [pool-1-thread-3]  Repairing standard header line for field AC because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Allele count in genotypes' but standard is 'Allele count in genotypes, for each ALT allele, in the same order as listed'
ERROR [2013-01-27 09:12:26,812]  [IGV.java:1491] [pool-1-thread-3]  Error loading tracks
java.lang.RuntimeException:
Line: chrII    3797023    .    T    C,A    42.50    .    AC1=2;AC=12,0;AF1=1;AN=12;DP4=1,2,51,69;DP=187;EFF=DOWNSTREAM(MODIFIER||||158|Y8A9A.3|protein_coding|CODING|Y8A9A.3|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|cTg/cAg|L399Q|1360|Y8A9A.2|protein_coding|CODING|Y8A9A.2|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|cTg/cCg|L399P|1360|Y8A9A.2|protein_coding|CODING|Y8A9A.2|),UPSTREAM(MODIFIER||||282|fbxb-99|protein_coding|CODING|Y8A9A.5|),UPSTREAM(MODIFIER|||||Y8A9A.1|pseudogene|NON_CODING|Y8A9A.1|);FQ=-65;MQ=11;PV4=0.33,0.032,0.41,1;SF=3,4,5,6,7,8;VDB=0.0374    GT:GQ:PL    .    .    .    1/1:68:68,38,0,68,16,61    1/1:56:71,30,0,.,.,.    1/1:87:72,51,0,77,34,71    1/1:87:81,46,0,.,.,.    1/1:80:80,42,0,.,.,.    1/1:99:81,84,0,85,57,77
    at org.broad.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:68)
    at org.broad.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:201)
    at org.broad.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:181)
    at org.broad.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:148)
    at org.broad.igv.track.TribbleFeatureSource.initFeatureWindowSize(TribbleFeatureSource.java:158)
    at org.broad.igv.track.TribbleFeatureSource.init(TribbleFeatureSource.java:80)
    at org.broad.igv.track.TribbleFeatureSource.<init>(TribbleFeatureSource.java:62)
    at org.broad.igv.track.TrackLoader.loadIndexed(TrackLoader.java:299)
    at org.broad.igv.track.TrackLoader.load(TrackLoader.java:160)
    at org.broad.igv.track.TrackLoader.load(TrackLoader.java:101)
    at org.broad.igv.ui.IGV.load(IGV.java:1529)
    at org.broad.igv.ui.IGV$10.run(IGV.java:1473)
    at org.broad.igv.ui.IGV.loadResources(IGV.java:1500)
    at org.broad.igv.ui.IGV$4.run(IGV.java:618)
    at org.broad.igv.util.LongRunningTask.call(LongRunningTask.java:54)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.NumberFormatException: For input string: "."
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:449)
    at java.lang.Integer.valueOf(Integer.java:554)
    at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.decodeInts(AbstractVCFCodec.java:690)
    at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:651)
    at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:92)
    at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:130)
    at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.ensureSampleNameMap(LazyGenotypesContext.java:151)
    at org.broadinstitute.sting.utils.variantcontext.GenotypesContext.getSampleNames(GenotypesContext.java:634)
    at org.broadinstitute.sting.utils.variantcontext.VariantContext.getSampleNames(VariantContext.java:842)
    at org.broad.igv.variant.vcf.VCFVariant.getSampleNames(VCFVariant.java:189)
    at org.broad.igv.variant.vcf.VCFVariant.init(VCFVariant.java:52)
    at org.broad.igv.variant.vcf.VCFVariant.<init>(VCFVariant.java:47)
    at org.broad.igv.feature.tribble.VCFWrapperCodec.decode(VCFWrapperCodec.java:48)
    at org.broad.igv.feature.tribble.VCFWrapperCodec.decode(VCFWrapperCodec.java:25)
    at org.broad.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:65)
    ... 19 more
INFO [2013-01-27 09:12:26,814]  [MessageUtils.java:60] [pool-1-thread-3]  Error loading /Volumes/leon Home/science/WGS/12_10_16/DAs.vcf:
Line: chrII    3797023    .    T    C,A    42.50    .    AC1=2;AC=12,0;AF1=1;AN=12;DP4=1,2,51,69;DP=187;EFF=DOWNSTREAM(MODIFIER||||158|Y8A9A.3|protein_coding|CODING|Y8A9A.3|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|cTg/cAg|L399Q|1360|Y8A9A.2|protein_coding|CODING|Y8A9A.2|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|cTg/cCg|L399P|1360|Y8A9A.2|protein_coding|CODING|Y8A9A.2|),UPSTREAM(MODIFIER||||282|fbxb-99|protein_coding|CODING|Y8A9A.5|),UPSTREAM(MODIFIER|||||Y8A9A.1|pseudogene|NON_CODING|Y8A9A.1|);FQ=-65;MQ=11;PV4=0.33,0.032,0.41,1;SF=3,4,5,6,7,8;VDB=0.0374    GT:GQ:PL    .    .    .    1/1:68:68,38,0,68,16,61    1/1:56:71,30,0,.,.,.    1/1:87:72,51,0,77,34,71    1/1:87:81,46,0,.,.,.    1/1:80:80,42,0,.,.,.    1/1:99:81,84,0,85,57,77

Leon Avery

unread,
Jan 27, 2013, 9:39:35 AM1/27/13
to igv-...@googlegroups.com
Sorry, here is the attachment.
DAs.vcf

Jacob Silterra

unread,
Jan 28, 2013, 11:23:25 AM1/28/13
to igv-...@googlegroups.com
The line has a comma separated list of integers, some of which are present some of which are simply ".". IGV supports completely missing fields (which would simply have a "." but does not currently support missing only certain elements of a field. Fields like this:

1/1:56:71,30,0,.,.,.

will cause the error you are seeing.


--
 
 
 



--
Jacob Silterra
Software Engineer
Broad Institute

Leon Avery

unread,
Jan 28, 2013, 1:52:36 PM1/28/13
to igv-...@googlegroups.com
Thanks. I'll write a script to filter those out. I don't suppose there is a specification anywhere of what IGV *WILL* accept? I'll have to replace the dots with something...

--
Leon Avery
lav...@vcu.edu
Department of Physiology and Biophysics
Virginia Commonwealth University
P.O. Box 980551
1220 E. Broad
Molecular Medicine Research Building, Rm 2044
Richmond, Virginia 23298-0551
804-628-2296 / fax 804-828-9492
--
 
---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jacob Silterra

unread,
Jan 28, 2013, 2:10:15 PM1/28/13
to igv-...@googlegroups.com
The vcf spec is at http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41, and we follow that spec. Since that is an integer field you'll need to replace the missing values with an integer.

This spec doesn't define the appropriate behavior in this case, last I heard it was under discussion on the vcftools-spec mailing list (https://lists.sourceforge.net/lists/listinfo/vcftools-spec)


To unsubscribe from this group, send email to igv-help+u...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Leon Avery

unread,
Jan 28, 2013, 2:29:47 PM1/28/13
to igv-...@googlegroups.com, Jacob Silterra
OK. I believe the problematic partial info is being produced by vcf-merge. I think I can get away with replacing all the dots in PL with 0s, or maybe deleting it altogether. It contains no useful information for my application, anyway.

Thanks,

Leon Avery
lav...@vcu.edu
Department of Physiology and Biophysics
Virginia Commonwealth University
P.O. Box 980551
1220 E. Broad
Molecular Medicine Research Building, Rm 2044
Richmond, Virginia 23298-0551
804-628-2296 / fax 804-828-9492

Jim Robinson

unread,
Jan 28, 2013, 3:55:52 PM1/28/13
to igv-...@googlegroups.com, Jacob Silterra
That should work.  I assume these files were produced with vcf-tools?  We use the GATK to read VCF files,  and at the moment there is apparently no consensus on this case between the GATK developers and the developers of vcf-tools. 

Jim

Leon Avery

unread,
Jan 28, 2013, 5:13:54 PM1/28/13
to igv-...@googlegroups.com
I used vcf-tools only for the last step, merging VCF files from a bunch of different strains.

--
Leon Avery
lav...@vcu.edu
Department of Physiology and Biophysics
Virginia Commonwealth University
P.O. Box 980551
1220 E. Broad
Molecular Medicine Research Building, Rm 2044
Richmond, Virginia 23298-0551
804-628-2296 / fax 804-828-9492

Hamid Younesy

unread,
May 1, 2013, 2:21:26 PM5/1/13
to igv-...@googlegroups.com
I had the same issue trying to use the variant files from mouse genomes project: ftp://ftp-mouse.sanger.ac.uk/REL-1211-SNPs_Indels/
I added a temporary fix (hack?) to the source code. Although probably a better job could be done, but it seems to have solved the issue for me:

org/broad/igv/feature/tribble/VCFWrapperCodec.java:  

modified the following:

public VCFVariant decode(String line) {
    VariantContext vc = (VariantContext) wrappedCodec.decode(line);
    if (vc == null) {
        return null;
    }
    String chr = genome == null ? vc.getChr() : genome.getChromosomeAlias(vc.getChr());
    return new VCFVariant(vc, chr);
}

to:

public VCFVariant decode(String line) {
    // HY: temporary fix for the following:
    // IGV supports completely missing fields (which would simply have a "."
    // but does not currently support missing only certain elements of a field.
    // https://groups.google.com/forum/?fromgroups=#!msg/igv-help/7-uTVcACAEE/8mRaMDhY1JwJ
    
    byte[] bline = line.getBytes();
    boolean bChanged = false;
    byte cp = 0, cc = 0, cn = 0; // prev, current, next character 
    //HY: replace ",.," with ",0,": Could also use line.split(","), but this should be faster:
    for (int i = 0; i < bline.length; ++i) {

        cn = (i < bline.length - 1) ? bline[i+1] : 0;
        cc = bline[i];
        if (cc == '.' && (cp == ',' || cn == ','))
        {
            bline[i] = '0';
            bChanged = true;
        }
        cp = cc;
     }
     String newLine = line;
     if (bChanged) {
         newLine = new String(bline);
     }
     VariantContext vc = (VariantContext) wrappedCodec.decode(newLine);
     //VariantContext vc = (VariantContext) wrappedCodec.decode(line);
     
     if (vc == null) {
        return null;
     }
     String chr = genome == null ? vc.getChr() : genome.getChromosomeAlias(vc.getChr());
     return new VCFVariant(vc, chr);
}

Tommy Tang

unread,
Jun 20, 2013, 11:24:05 AM6/20/13
to igv-...@googlegroups.com, Jacob Silterra
Hi Leon Avery,

I had the same problem, if you have a shell script or python script for cleaning up the file. could you please share it with me? Thanks!

Tommy

On Monday, January 28, 2013 2:29:47 PM UTC-5, Leon Avery wrote:

Leon Avery

unread,
Jun 20, 2013, 11:38:46 AM6/20/13
to igv-...@googlegroups.com, Tommy Tang, Jacob Silterra
Well, I have to admit that I just did a sloppy quick-fix, rather than really solving the problem. This is what I use (in a Makefile):
ESs.vcf: $(foreach str,$(ESS),$(str)/$(str)br.vcf.gz.tbi)
    vcf-merge $(^:%.tbi=%) | \
    sed -e 's/,\./,0/g' >! $@
The sed command is what does it. The sloppiness, of course, is doing it on the WHOLE file, rather than just the field in question. I checked that ",." appears nowhere else in my VCF files, but that's obviously a dangerous assumption in general.

--
Leon Avery
lav...@vcu.edu
Department of Physiology and Biophysics
Virginia Commonwealth University
P.O. Box 980551
1220 E. Broad
Molecular Medicine Research Building, Rm 2044
Richmond, Virginia 23298-0551
804-628-2296 / fax 804-828-9492
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/7-uTVcACAEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

Jacob Silterra

unread,
Jun 25, 2013, 4:23:51 PM6/25/13
to igv-...@googlegroups.com
The latest snapshot build of IGV has a workaround in place for this issue. It tries to parse each line, and if parsing fails, replaces ",." and ".," with ",0" and "0,", respectively. It does have a similar drawback, though it doesn't do the string replacement on each file, it does it on the whole line, not just the genotype fields.

The commit is at https://github.com/broadinstitute/IGV/commit/97977073b7ec31326e7e156f3f0e7e3472151aa7, comments welcome.
Assuming there are no objections this will be in the next point release of IGV.

-Jacob


Sebastian Sośnik

unread,
Jan 10, 2014, 5:41:34 AM1/10/14
to igv-...@googlegroups.com
I had the same problem in my VCF: "." (dot) in the PL array indicating a single missing value. 
The fix recommended by Hamid Younesy works very well. Thank you for publishing the idea.
In my implementation, it looks as follows:

public class MyVCFCodec extends VCFCodec {

/**
* Override in order to support "." (missing value) in PL array  
*/
  @Override
  public LazyGenotypesContext.LazyData createGenotypeMap(final String str, final List<Allele> alleles, final String chr, final int pos) {
    String s = str.replaceAll("\\.,", "0,").replaceAll(",\\.", ",0");
    return super.createGenotypeMap(s, alleles, chr, pos);
  }
}




Reply all
Reply to author
Forward
0 new messages