UNEAK to generate vcf files - StringIndexOutOfBounds error

350 views
Skip to first unread message

Rebecca Bloomer

unread,
Sep 8, 2014, 8:46:02 PM9/8/14
to tas...@googlegroups.com
I'm currently working through the following modification of the UNEAK pipeline, as recommended by Katie Hyma (thanks Katie!) to get vcf files from the non-reference pipeline.

1) FastqToTagCountPlugin
2) MergeMultipleTagCountPlugin - if there are multiple lanes of data
3) TagCountToFastqPlugin
4) UTagCountToTagPairPlugin
5) UExportTagPairPlugin
6) FastqToTBTPlugin
7) MergeTagsByTaxaFilesPlugin - if there are multiple lanes of data
8) tbt2vcfPlugin
9) MergeDuplicateSNP_vcf_Plugin

At the MergeTagsByTaxaFilesPlugin step, using the following arguments
 -MergeTagsByTaxaFilesPlugin -i UNEAK/tagsByTaxa -o UNEAK/tagsByTaxa/mergedTagsByTaxa -s 300000000 -h hapmap -endPlugin -runfork1

I'm getting the following error: 
ERROR net.maizegenetics.pipeline.TasselPipeline - java.lang.StringIndexOutOfBoundsException: String index out of range: -1

java.lang.IllegalArgumentException: TasselPipeline: parseArgs: Unknown parameter: -MergeTagsByTaxaFilesPlugin

at net.maizegenetics.pipeline.TasselPipeline.parseArgs(TasselPipeline.java:1233)

at net.maizegenetics.pipeline.TasselPipeline.<init>(TasselPipeline.java:121)

at net.maizegenetics.pipeline.TasselPipeline.main(TasselPipeline.java:161)


Presumably this is an issue with the TBT files generated at theFastqToTBTPlugin step - I get the same error during the subsequent merging whether I use the master tag count file or the topm to generate the TBT files. Any advice on what I'm doing wrong would be appreciated!


Rebecca

Rebecca Bloomer

unread,
Sep 11, 2014, 11:34:23 PM9/11/14
to tas...@googlegroups.com
Got that bit figured out - but now I'm running into a new problem which, interestingly, looks to be the same issue whether I use the TASSEL3 pipeline outlined above or the TASSEL5 UNEAK diversion.

When I try to call SNPs with either the tbt2vcfPlugin in TASSEL3 or the DiscoverySNPCallerPlugin in TASSEL5, I get a similar error:  I begin getting an output, the first five tags are shown, it says "begin processing chromosome 1" and then I get java.lang.ExceptionInInitializerError and a whole host of other error messages including NullPointerException and NoClassDef messages (happy to share if they're diagnostically useful).

Anyone have any idea what's going wrong?

Terry Casstevens

unread,
Sep 12, 2014, 11:53:17 AM9/12/14
to Tassel User Group
Please send command and all output.

Best,

Terry
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/925cb864-75fc-41e2-8cf3-b645548ffdfa%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Rebecca Bloomer

unread,
Sep 14, 2014, 4:02:22 PM9/14/14
to tas...@googlegroups.com
Hi Terry,

See below - I've removed the list of numbers for SNP likelihood ratio for length reasons. Any help appreciated!

Tassel Pipeline Arguments: -fork1 -DiscoverySNPCallerPlugin -i UNEAK_T5/05_TagsByTaxa/TBT_HDF5_pivot.h5 -m UNEAK_T5/04_TOPM/topm.bin -o UNEAK_T5/04_TOPM/topm_variants.bin -log UNEAK_T5/07_VCF/logDSCP.log -mnMAF 0.01 -mnLCov 0.1 -sC 1 -eC 1 -endPlugin -runfork1

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Version: 5.1.0  Date: August 28, 2014

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Max Available Memory Reported by JVM: 21845 MB

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Java Version: 1.8.0_20

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -DiscoverySNPCallerPlugin, -i, UNEAK_T5/05_TagsByTaxa/TBT_HDF5_pivot.h5, -m, UNEAK_T5/04_TOPM/topm.bin, -o, UNEAK_T5/04_TOPM/topm_variants.bin, -log, UNEAK_T5/07_VCF/logDSCP.log, -mnMAF, 0.01, -mnLCov, 0.1, -sC, 1, -eC, 1, -endPlugin, -runfork1]

net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin

File = UNEAK_T5/04_TOPM/topm.bin

TagMapFile Row Read:0

Count of Tags=335572

initPhysicalSort

Position index sort begin.

Position index sort end.

[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - minTaxaWithLocus:58 MinF:-2.00000 MinMAF:0.0100000 MinMAC:10 


[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - includeRare:false includeGaps:false 


[Thread-0] INFO net.maizegenetics.plugindef.AbstractPlugin - 

DiscoverySNPCallerPlugin Parameters

i: UNEAK_T5/05_TagsByTaxa/TBT_HDF5_pivot.h5

y: false

m: UNEAK_T5/04_TOPM/topm.bin

o: UNEAK_T5/04_TOPM/topm_variants.bin

log: UNEAK_T5/07_VCF/logDSCP.log

mnF: -2.0

p: null

mnMAF: 0.01

mnMAC: 10

mnLCov: 0.1

eR: 0.01

ref: null

sC: 1

eC: 1

inclRare: false

inclGaps: false

callBiSNPsWGap: false


[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - Finding SNPs in UNEAK_T5/05_TagsByTaxa/TBT_HDF5_pivot.h5.

[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - StartChr:1 EndChr:1 


Starting Read Table Sort ...Done in 135ms

initPhysicalSort

Position index sort begin.

Position index sort end.

[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - 

As a check, here are the first 5 tags in the TOPM (sorted by position):

TGCAGAAAAAAAAAAAAATTGGTCGGTACCTCCCCCCCAGCCTCCGCCCGTGCGCTTAGAGTCC 64 64 0 * * * * * * * * *

TGCAGAAAAAAAAAAAAATTGGTCGGTACCTCCGCCCCAGCCTCCGCCCGTGCGCTTAGAGTCC 64 64 0 * * * * * * * * *

TGCAGAAAAAAAAAAAACGTTCTTTCAGGCCGGCGGGATGTGCTTGGGGATCTCCGCTGAAGCG 64 1001 1064 0 * * * * * * * *

TGCAGAAAAAAAAAAAACGTTCTTTCAGGCCGGCGGGATGTGCTTGGTGATCTCCGCTGAAGCG 64 1001 1064 0 * * * * * * * *

TGCAGAAAAAAAAAAAACTATGAAATGAAGTGAAGACTTGATCGCTAGGACAAGCAAGATTTGA 64 2001 2064 0 * * * * * * * *

[Thread-0] INFO net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin - 


Processing chromosome 1...



Initializing the cutoffs for quantitative SNP calling likelihood ratio (pHet/pErr) >1


totalReadsForSNPInIndiv minLessTaggedAlleleCountForHet

...

java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at org.biojava3.alignment.Alignments.getListFromFutures(Alignments.java:282)

at org.biojava3.alignment.Alignments.runPairwiseScorers(Alignments.java:602)

at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:173)

at net.maizegenetics.dna.map.TagLocus.getVariableSites(TagLocus.java:501)

at net.maizegenetics.dna.map.TagLocus.getSNPCallsQuant(TagLocus.java:297)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.findSNPsInTagLocus(DiscoverySNPCallerPlugin.java:689)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.discoverSNPsOnChromosome(DiscoverySNPCallerPlugin.java:632)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.processData(DiscoverySNPCallerPlugin.java:131)

at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:83)

at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1164)

at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)

Caused by: java.lang.ExceptionInInitializerError

at org.biojava3.alignment.SimpleAlignedSequence.setLocation(SimpleAlignedSequence.java:358)

at org.biojava3.alignment.SimpleAlignedSequence.<init>(SimpleAlignedSequence.java:88)

at org.biojava3.alignment.SimpleProfile.<init>(SimpleProfile.java:118)

at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:86)

at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:69)

at org.biojava3.alignment.NeedlemanWunsch.setProfile(NeedlemanWunsch.java:71)

at org.biojava3.alignment.template.AbstractMatrixAligner.align(AbstractMatrixAligner.java:342)

at org.biojava3.alignment.template.AbstractPairwiseSequenceAligner.getPair(AbstractPairwiseSequenceAligner.java:112)

at org.biojava3.alignment.FractionalIdentityScorer.align(FractionalIdentityScorer.java:112)

at org.biojava3.alignment.FractionalIdentityScorer.getScore(FractionalIdentityScorer.java:105)

at org.biojava3.alignment.template.CallablePairwiseSequenceScorer.call(CallablePairwiseSequenceScorer.java:53)

at org.biojava3.alignment.template.CallablePairwiseSequenceScorer.call(CallablePairwiseSequenceScorer.java:38)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException

at java.util.Collections$UnmodifiableCollection.<init>(Collections.java:1026)

at java.util.Collections$UnmodifiableList.<init>(Collections.java:1302)

at java.util.Collections.unmodifiableList(Collections.java:1287)

at org.biojava3.core.sequence.location.template.AbstractLocation.<init>(AbstractLocation.java:111)

at org.biojava3.core.sequence.location.template.AbstractLocation.<init>(AbstractLocation.java:85)

at org.biojava3.core.sequence.location.SimpleLocation.<init>(SimpleLocation.java:57)

at org.biojava3.core.sequence.location.SimpleLocation.<init>(SimpleLocation.java:53)

at org.biojava3.core.sequence.location.template.Location.<clinit>(Location.java:48)

... 16 more

[Thread-0] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin: progress: 100%

[Thread-0] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.

Exception in thread "Thread-0" java.lang.NoClassDefFoundError: Could not initialize class org.biojava3.core.sequence.location.SimpleLocation

at org.biojava3.alignment.SimpleAlignedSequence.setLocation(SimpleAlignedSequence.java:358)

at org.biojava3.alignment.SimpleAlignedSequence.<init>(SimpleAlignedSequence.java:88)

at org.biojava3.alignment.SimpleProfile.<init>(SimpleProfile.java:118)

at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:86)

at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:69)

at org.biojava3.alignment.NeedlemanWunsch.setProfile(NeedlemanWunsch.java:71)

at org.biojava3.alignment.template.AbstractMatrixAligner.align(AbstractMatrixAligner.java:342)

at org.biojava3.alignment.template.AbstractPairwiseSequenceAligner.getPair(AbstractPairwiseSequenceAligner.java:112)

at org.biojava3.alignment.FractionalIdentityScorer.align(FractionalIdentityScorer.java:112)

at org.biojava3.alignment.FractionalIdentityScorer.getMaxScore(FractionalIdentityScorer.java:92)

at org.biojava3.alignment.template.AbstractScorer.getDistance(AbstractScorer.java:40)

at org.biojava3.alignment.template.AbstractScorer.getDistance(AbstractScorer.java:35)

at org.biojava3.alignment.GuideTree.<init>(GuideTree.java:80)

at org.biojava3.alignment.Alignments.getMultipleSequenceAlignment(Alignments.java:176)

at net.maizegenetics.dna.map.TagLocus.getVariableSites(TagLocus.java:501)

at net.maizegenetics.dna.map.TagLocus.getSNPCallsQuant(TagLocus.java:297)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.findSNPsInTagLocus(DiscoverySNPCallerPlugin.java:689)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.discoverSNPsOnChromosome(DiscoverySNPCallerPlugin.java:632)

at net.maizegenetics.analysis.gbs.DiscoverySNPCallerPlugin.processData(DiscoverySNPCallerPlugin.java:131)

at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:83)

at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1164)

at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)


Terry Casstevens

unread,
Sep 15, 2014, 10:29:37 AM9/15/14
to Tassel User Group
This looks like the problem with biojava running under Java 8.
Although that should have been fixed in Tassel version 5.1.0, as we
upgraded the biojava libraries. The libraries being used by
run_pipeline.pl should have been listed at the beginning, but I don't
see that.


On Sun, Sep 14, 2014 at 4:02 PM, Rebecca Bloomer
> https://groups.google.com/d/msgid/tassel/3b953df3-74b7-402a-a02e-cd2786a35d22%40googlegroups.com.

Rebecca Bloomer

unread,
Sep 16, 2014, 7:34:07 PM9/16/14
to tas...@googlegroups.com
This bit? Left it out by mistake - sorry!

/Applications/tassel5/run_pipeline.pl -Xmx24g -fork1 -DiscoverySNPCallerPlugin -i UNEAK_T5/05_TagsByTaxa/TBT_HDF5_pivot.h5 -m UNEAK_T5/04_TOPM/topm.bin -o UNEAK_T5/04_TOPM/topm_variants.bin -log UNEAK_T5/07_VCF/logDSCP.log -mnMAF 0.01 -mnLCov 0.1 -sC 1 -eC 1 -endPlugin -runfork1

/Applications/tassel5/lib/batik-awt-util.jar:/Applications/tassel5/lib/batik-css.jar:/Applications/tassel5/lib/batik-dom.jar:/Applications/tassel5/lib/batik-ext.jar:/Applications/tassel5/lib/batik-gui-util.jar:/Applications/tassel5/lib/batik-gvt.jar:/Applications/tassel5/lib/batik-parser.jar:/Applications/tassel5/lib/batik-svg-dom.jar:/Applications/tassel5/lib/batik-svggen.jar:/Applications/tassel5/lib/batik-util.jar:/Applications/tassel5/lib/batik-xml.jar:/Applications/tassel5/lib/biojava3-alignment-3.0.jar:/Applications/tassel5/lib/biojava3-core-3.0.jar:/Applications/tassel5/lib/biojava3-phylo-3.0.jar:/Applications/tassel5/lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:/Applications/tassel5/lib/colt.jar:/Applications/tassel5/lib/commons-math-2.2.jar:/Applications/tassel5/lib/ejml-0.23.jar:/Applications/tassel5/lib/forester.jar:/Applications/tassel5/lib/geronimo-spec-activation-1.0.2-rc4.jar:/Applications/tassel5/lib/guava-14.0.1.jar:/Applications/tassel5/lib/itextpdf-5.1.0.jar:/Applications/tassel5/lib/jcommon-1.0.6.jar:/Applications/tassel5/lib/jfreechart-1.0.3.jar:/Applications/tassel5/lib/json-simple-1.1.1.jar:/Applications/tassel5/lib/junit-4.10.jar:/Applications/tassel5/lib/LiuExt.jar:/Applications/tassel5/lib/log4j-1.2.13.jar:/Applications/tassel5/lib/mail-1.4.jar:/Applications/tassel5/lib/poi-3.0.1-FINAL-20070705.jar:/Applications/tassel5/lib/xercesImpl.jar:/Applications/tassel5/lib/xml.jar:/Applications/tassel5/lib/xmlParserAPIs.jar:/Applications/tassel5/sTASSEL.jar

Memory Settings: -Xms512m -Xmx24g

Terry Casstevens

unread,
Sep 17, 2014, 11:07:58 AM9/17/14
to Tassel User Group
Dear Rebecca,

I made a mistake building the last release of Tassel 5.  I didn't replace the biojava libraries.  How did you install Tassel 5?  This will be fixed in the next release, but I can send you the correct libraries privately if you want?

Thank you,

Terry


Rebecca Bloomer

unread,
Sep 17, 2014, 9:27:10 PM9/17/14
to tas...@googlegroups.com
It's no worry - I can hold out for the next release as I'm primarily using Tassel4 anyway.
Reply all
Reply to author
Forward
0 new messages