String index out of range: 2

790 views
Skip to first unread message

Alexander Berry

unread,
Aug 13, 2016, 12:27:39 PM8/13/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi,

I am trying to load a VCF file into Tassel 5 and I am getting a String index out of range: 2 error. I've been successfully using this software but this file just isn't loading and I cannot figure out why. Any insight here would be greatly appreciated. 

Thanks,

Alex

Here is the debug info:


[AWT-EventQueue-0] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.29  Date: August 4, 2016
[AWT-EventQueue-0] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 3641 MB
[AWT-EventQueue-0] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_40
[AWT-EventQueue-0] INFO net.maizegenetics.tassel.TasselLogging - OS: Mac OS X
[AWT-EventQueue-0] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 4
[Thread-7] INFO net.maizegenetics.analysis.data.FileLoadPlugin - Start Loading File: /Users/alex/Desktop/cruzi_data/Organized/First_60/HaplotypeCaller/test_haploid_1_DP20_remove_ast_excel.vcf time: Aug 13, 2016 12:23:39
Err Site Number:0
Err Site Number:0
Err Site Number:Position Chr:KB222878.1 Pos:486638 Name:SKB222878.1_486638 Variants:A/G MAF:NaN Ref:A
Err:KB222878.1 486638 . A G 6685.2 . AC=4;AF=1.00;AN=4;DP=178;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.76;QD=32.20;SOR=0.716 GT:AD:DP:GQ:PL 1:0,38:38:99:1370,0 1:0,47:47:99:1781,0 1:0,50:50:99:1879,0 1:0,43:43:99:1682,0
Err Site Number:0
Err Site Number:Position Chr:KB222880.1 Pos:241058 Name:SKB222880.1_241058 Variants:A/G MAF:NaN Ref:A
Err:KB222880.1 241058 . A G 365.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.38;ClippingRankSum=1.76;DP=177;FS=8.112;MLEAC=2;MLEAF=0.500;MQ=41.98;MQRankSum=0.263;QD=3.62;ReadPosRankSum=1.42;SOR=1.450 GT:AD:DP:GQ:PL 1:20,25:45:99:195,0 1:25,31:56:99:205,0 0:42,0:42:99:0,294 0:34,0:34:99:0,155
Err Site Number:Position Chr:KB222877.1 Pos:1258 Name:SKB222877.1_1258 Variants:G/A MAF:NaN Ref:G
Err:KB222877.1 1258 . G A 7332.2 . AC=4;AF=1.00;AN=4;DP=190;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=42.00;QD=26.02;SOR=0.693 GT:AD:DP:GQ:PL 1:0,50:50:99:1912,0 1:0,35:35:99:1369,0 1:0,53:53:99:2015,0 1:0,52:52:99:2063,0
Err Site Number:0
Err Site Number:Position Chr:KB222882.1 Pos:148739 Name:SKB222882.1_148739 Variants:C/G MAF:NaN Ref:C
Err:KB222882.1 148739 . C G 6369.2 . AC=4;AF=1.00;AN=4;DP=170;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.75;QD=29.02;SOR=1.106 GT:AD:DP:GQ:PL 1:0,38:38:99:1453,0 1:0,45:45:99:1616,0 1:0,41:41:99:1553,0 1:0,46:46:99:1774,0
Err Site Number:0
Err Site Number:Position Chr:KB222885.1 Pos:20487 Name:SKB222885.1_20487 Variants:G/A MAF:NaN Ref:G
Err:KB222885.1 20487 . G A 4129.2 . AC=4;AF=1.00;AN=4;DP=115;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=29.61;QD=25.50;SOR=1.219 GT:AD:DP:GQ:PL 1:0,30:30:99:1067,0 1:0,31:31:99:1101,0 1:0,30:30:99:1089,0 1:0,24:24:99:899,0
Err Site Number:0
Err Site Number:Position Chr:KB222887.1 Pos:220351 Name:SKB222887.1_220351 Variants:T/C MAF:NaN Ref:T
Err:KB222887.1 220351 . T C 6602.2 . AC=4;AF=1.00;AN=4;DP=188;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=37.54;QD=25.99;SOR=0.759 GT:AD:DP:GQ:PL 1:0,53:53:99:1920,0 1:0,28:28:99:993,0 1:0,47:47:99:1706,0 1:0,58:58:99:2010,0
Err Site Number:0
Err Site Number:Position Chr:KB222891.1 Pos:116432 Name:SKB222891.1_116432 Variants:G/T MAF:NaN Ref:G
Err:KB222891.1 116432 . G T 332.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=0.860;ClippingRankSum=0.090;DP=184;FS=5.047;MLEAC=2;MLEAF=0.500;MQ=41.55;MQRankSum=-1.401e+00;QD=3.82;ReadPosRankSum=1.30;SOR=0.433 GT:AD:DP:GQ:PL 0:41,0:41:99:0,170 1:16,23:39:99:201,0 0:28,28:56:83:0,83 1:22,26:48:99:166,0
Err Site Number:0
Err Site Number:Position Chr:KB222895.1 Pos:264926 Name:SKB222895.1_264926 Variants:C/T MAF:NaN Ref:C
Err:KB222895.1 264926 . C T 770.18 . AC=1;AF=0.250;AN=4;BaseQRankSum=0.837;ClippingRankSum=0.131;DP=161;FS=10.013;MLEAC=1;MLEAF=0.250;MQ=41.47;MQRankSum=0.376;QD=12.63;ReadPosRankSum=-1.483e+00;SOR=0.144 GT:AD:DP:GQ:PL 0:29,0:29:99:0,107 0:25,0:25:99:0,331 1:20,41:61:99:803,0 0:46,0:46:99:0,363
Err Site Number:0
Err Site Number:Position Chr:KB222899.1 Pos:234600 Name:SKB222899.1_234600 Variants:C/T MAF:NaN Ref:C
Err:KB222899.1 234600 . C T 8180.2 . AC=4;AF=1.00;AN=4;DP=224;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.91;QD=33.68;SOR=0.720 GT:AD:DP:GQ:PL 1:0,53:53:99:1936,0 1:0,55:55:99:2025,0 1:0,52:52:99:1909,0 1:0,63:63:99:2337,0
Err Site Number:0
Err Site Number:Position Chr:KB222906.1 Pos:49972 Name:SKB222906.1_49972 Variants:A/T MAF:NaN Ref:A
Err:KB222906.1 49972 . A T 5446.2 . AC=4;AF=1.00;AN=4;DP=142;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=40.03;QD=31.33;SOR=0.780 GT:AD:DP:GQ:PL 1:0,36:36:99:1436,0 1:0,31:31:99:1228,0 1:0,34:34:99:1309,0 1:0,41:41:99:1500,0
Err Site Number:0
Err Site Number:Position Chr:KB222912.1 Pos:37091 Name:SKB222912.1_37091 Variants:G/A MAF:NaN Ref:G
Err:KB222912.1 37091 . G A 7933.2 . AC=4;AF=1.00;AN=4;DP=186;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.33;QD=24.30;SOR=0.759 GT:AD:DP:GQ:PL 1:0,58:58:99:2510,0 1:0,45:45:99:1923,0 1:0,39:39:99:1599,0 1:0,44:44:99:1928,0
Err Site Number:0
Err Site Number:Position Chr:KB222918.1 Pos:118066 Name:SKB222918.1_118066 Variants:C/T MAF:NaN Ref:C
Err:KB222918.1 118066 . C T 1979.18 . AC=3;AF=0.750;AN=4;BaseQRankSum=0.972;ClippingRankSum=-4.930e-01;DP=214;FS=6.127;MLEAC=3;MLEAF=0.750;MQ=41.57;MQRankSum=0.190;QD=12.22;ReadPosRankSum=0.012;SOR=0.988 GT:AD:DP:GQ:PL 1:17,39:56:99:885,0 1:21,40:61:99:699,0 0:51,0:51:99:0,513 1:17,28:45:99:428,0
Err Site Number:0
Err Site Number:Position Chr:KB222927.1 Pos:26006 Name:SKB222927.1_26006 Variants:T/C MAF:NaN Ref:T
Err:KB222927.1 26006 . T C 9435.2 . AC=4;AF=1.00;AN=4;DP=212;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.91;QD=35.16;SOR=1.131 GT:AD:DP:GQ:PL 1:0,52:52:99:2342,0 1:0,44:44:99:1930,0 1:0,58:58:99:2618,0 1:0,57:57:99:2572,0
Err Site Number:0
Err Site Number:Position Chr:KB222937.1 Pos:23770 Name:SKB222937.1_23770 Variants:A/C MAF:NaN Ref:A
Err:KB222937.1 23770 . A C 6629.2 . AC=4;AF=1.00;AN=4;DP=175;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.98;QD=33.48;SOR=0.751 GT:AD:DP:GQ:PL 1:0,40:40:99:1532,0 1:0,35:35:99:1355,0 1:0,37:37:99:1425,0 1:0,63:63:99:2344,0
Err Site Number:0
Err Site Number:Position Chr:KB222946.1 Pos:104445 Name:SKB222946.1_104445 Variants:G/T MAF:NaN Ref:G
Err:KB222946.1 104445 . G T 73.18 . AC=1;AF=0.250;AN=4;BaseQRankSum=1.09;ClippingRankSum=-5.130e-01;DP=108;FS=3.154;MLEAC=1;MLEAF=0.250;MQ=37.94;MQRankSum=2.54;QD=2.03;ReadPosRankSum=0.642;SOR=0.257 GT:AD:DP:GQ:PL 0:22,0:22:99:0,629 0:23,0:23:99:0,450 0:27,0:27:99:0,540 1:15,21:36:99:106,0
Err Site Number:0
Err Site Number:Position Chr:KB222960.1 Pos:40110 Name:SKB222960.1_40110 Variants:G/A MAF:NaN Ref:G
Err:KB222960.1 40110 . G A 6363.2 . AC=4;AF=1.00;AN=4;DP=166;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=41.84;QD=21.45;SOR=0.874 GT:AD:DP:GQ:PL 1:0,40:40:99:1537,0 1:0,45:45:99:1800,0 1:0,44:44:99:1646,0 1:0,37:37:99:1407,0
Err Site Number:0
Err Site Number:Position Chr:KB222979.1 Pos:64987 Name:SKB222979.1_64987 Variants:G/A MAF:NaN Ref:G
Err:KB222979.1 64987 . G A 504.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=2.79;ClippingRankSum=0.308;DP=177;FS=4.103;MLEAC=2;MLEAF=0.500;MQ=41.91;MQRankSum=0.450;QD=5.67;ReadPosRankSum=1.82;SOR=1.201 GT:AD:DP:GQ:PL 0:47,0:47:99:0,513 0:41,0:41:99:0,915 1:19,26:45:99:329,0 1:19,25:44:99:210,0
Err Site Number:0
Err Site Number:Position Chr:KB223014.1 Pos:25714 Name:SKB223014.1_25714 Variants:T/C MAF:NaN Ref:T
Err:KB223014.1 25714 . T C 6811.2 . AC=4;AF=1.00;AN=4;DP=186;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=36.00;QD=34.56;SOR=0.759 GT:AD:DP:GQ:PL 1:0,51:51:99:2251,0 1:0,60:60:99:2043,0 1:0,27:27:99:921,0 1:0,48:48:99:1623,0
Err Site Number:0
Err Site Number:Position Chr:KB223164.1 Pos:2689 Name:SKB223164.1_2689 Variants:G/A MAF:NaN Ref:G
Err:KB223164.1 2689 . G A 2598.2 . AC=4;AF=1.00;AN=4;BaseQRankSum=-4.130e-01;ClippingRankSum=-1.734e+00;DP=103;FS=3.139;MLEAC=4;MLEAF=1.00;MQ=22.93;MQRankSum=-5.780e-01;QD=25.23;ReadPosRankSum=-1.734e+00;SOR=0.348 GT:AD:DP:GQ:PL 1:0,28:28:99:742,0 1:1,20:21:99:432,0 1:0,27:27:99:724,0 1:0,27:27:99:727,0
[Thread-7] DEBUG net.maizegenetics.dna.snp.io.BuilderFromVCF - java.lang.StringIndexOutOfBoundsException: String index out of range: 2
java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at net.maizegenetics.dna.snp.io.BuilderFromVCF.buildEngine(BuilderFromVCF.java:211)
at net.maizegenetics.dna.snp.io.BuilderFromVCF.build(BuilderFromVCF.java:114)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:110)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:116)
at net.maizegenetics.analysis.data.FileLoadPlugin.processDatum(FileLoadPlugin.java:465)
at net.maizegenetics.analysis.data.FileLoadPlugin.performFunction(FileLoadPlugin.java:249)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1652)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.charAt(String.java:646)
at net.maizegenetics.dna.snp.io.ProcessVCFBlock.call(BuilderFromVCF.java:606)
at net.maizegenetics.dna.snp.io.ProcessVCFBlock.call(BuilderFromVCF.java:423)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[Thread-7] ERROR net.maizegenetics.analysis.data.FileLoadPlugin - java.lang.StringIndexOutOfBoundsException: String index out of range: 2
java.lang.IllegalStateException: java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at net.maizegenetics.dna.snp.io.BuilderFromVCF.buildEngine(BuilderFromVCF.java:215)
at net.maizegenetics.dna.snp.io.BuilderFromVCF.build(BuilderFromVCF.java:114)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:110)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:116)
at net.maizegenetics.analysis.data.FileLoadPlugin.processDatum(FileLoadPlugin.java:465)
at net.maizegenetics.analysis.data.FileLoadPlugin.performFunction(FileLoadPlugin.java:249)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1652)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
[Thread-7] DEBUG net.maizegenetics.analysis.data.FileLoadPlugin - java.lang.StringIndexOutOfBoundsException: String index out of range: 2
java.lang.IllegalStateException: java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at net.maizegenetics.dna.snp.io.BuilderFromVCF.buildEngine(BuilderFromVCF.java:215)
at net.maizegenetics.dna.snp.io.BuilderFromVCF.build(BuilderFromVCF.java:114)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:110)
at net.maizegenetics.dna.snp.ImportUtils.readFromVCF(ImportUtils.java:116)
at net.maizegenetics.analysis.data.FileLoadPlugin.processDatum(FileLoadPlugin.java:465)
at net.maizegenetics.analysis.data.FileLoadPlugin.performFunction(FileLoadPlugin.java:249)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1652)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
[Thread-7] INFO net.maizegenetics.analysis.data.FileLoadPlugin - Nothing Loaded for File: /Users/alex/Desktop/cruzi_data/Organized/First_60/HaplotypeCaller/test_haploid_1_DP20_remove_ast_excel.vcf time: Aug 13, 2016 12:23:46

Zack Miller

unread,
Aug 13, 2016, 1:28:21 PM8/13/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Alex,

Your file does not have diploid genotypes.  TASSEL currently only supports loading in VCF files with diploid calls.  

For instance you have calls which look like 1:17,39:... when TASSEL is expecting 1/1:17,39:...

Thanks,
Zack

Alexander Berry

unread,
Aug 14, 2016, 1:34:19 PM8/14/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Ah OK thanks. I didn't realize that Tassel only dealt with diploids. 

Thanks,

Alex

Edward S. Buckler

unread,
Aug 14, 2016, 1:40:22 PM8/14/16
to tas...@googlegroups.com
Hello Alex-
What ploidy is the species that you are working with?

We deal quite well inbred?
-Ed


--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/8e58337d-20ae-4289-b704-86cdc8384dfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tyr Wiesner-Hanks

unread,
May 2, 2017, 2:30:29 PM5/2/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Alex (or others Googling this error),

You can convert haploid to pseudo-diploid files with a sed workaround. I used the following bash code:

sed 's/\t\./\t.\/./g' haploid.vcf > diploidized.vcf 
for i in {0..9}; do 
        sed -i "s/\t$i:/\t$i\/$i:/g" diploidized.vcf 
done

The first line replaces . characters preceded by a tab (haploid missing) with ./. characters preceded by a tab (diploid missing) and pipes to a new file. The loop replaces numeric characters preceded by a tab and followed by a colon (the start of haploid genotypes) with a tab, the character, a /, and the character again (in place, consistent w/ diploids). If you have more than 10 genotypes in a given line

This works for my files, and I checked a few markers by hand back to the original haploid file. It is set up for cases where the GT field is the first entry in the VCF genotype fields, which I think is the norm. It obviously would not contain properly formatted depths or whatnot, since there was only one depth. Not sure if it will work for others.

sarthok rahman

unread,
Jul 6, 2017, 9:02:29 PM7/6/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Tyr,

Great solution! I am also working with haploid data and I tried to implement the solution. However, it did not work for one of my vcfs. I think I have more than 10 genotypes in that file. You started writing "if you have more than 10 genotypes in a given line" ... and it was not completed so if it is possible if you please share your thoughts on it that would be very helpful! Thanks in advance! 

Sarthok

Tyr Wiesner-Hanks

unread,
Jul 7, 2017, 9:46:43 AM7/7/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hmn, I'm not sure what I was going to write there. If you modify the for loop to be {0..20} or whatever, that should work. It will just get slower linearly with that number.

One important note, the above code will replace '.' fields with './.' everywhere in the file. That includes in fields 3 (ID) and 5 (filter), so it will change all your site names to './.' when read into TASSEL. Here is my workaround:

grep '#' haploid.vcf > diploidized.vcf
grep -v '#' haploid.vcf | cut -f 1-5 > tmp1
grep -v '#' haploid.vcf | cut -f 6- > tmp2
sed -i 's/\t\./\t.\/./g' tmp2 
for i in {0..20}; do 
        sed -i "s/\t$i:/\t$i\/$i:/g" tmp2 
done
paste tmp1 tmp2 >> diploidized.vcf
rm tmp1 tmp2

I think you could make this faster by using awk to skip the for loop and using csplit to not repeat the grep steps. But it's fast enough for me so I haven't tried to figure that out.

sarthok rahman

unread,
Jul 8, 2017, 4:40:29 PM7/8/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Thank you so much, Tyr! It worked perfectly! 
Reply all
Reply to author
Forward
0 new messages