Production SNP Caller PluginV2 error (Good Barcodes Read and String index out of range)

232 views
Skip to first unread message

Arthur Melo

unread,
Jun 25, 2015, 1:04:04 PM6/25/15
to tas...@googlegroups.com
Hi Tassel team, I hope you are well...

In a couple of weeks  I'm running Tassel 5.2.11 using a Unix server. In my guess all steps works normally. My SNPs are discovered and in have many variables estimated for its from SNP Quality Profiler. However, when I'm run the last step from GBS pipeline (Production SNP Caller Plugin) some error message are printed. There are two error messages related with Good Barcodes Read and String index out of range. Please, look: 

 Tassel Pipeline Arguments: -debug -fork1 -ProductionSNPCallerPluginV2 -db GBSV2.db -e PstI-MspI -i ./../HQ_data -k arguta_key.txt -o SNPs2.h5 -ko true -do true -mxTagL 64 -endPlugin -runfork1

[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.11  Date: June 4, 2015

[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 1365 MB

[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_45

[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -ProductionSNPCallerPluginV2, -db, GBSV2.db, -e, PstI-MspI, -i, ./../HQ_data, -k, arguta_key.txt, -o, SNPs2.h5, -ko, true, -do, true, -mxTagL, 64, -endPlugin, -runfork1]

net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2



[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 

ProductionSNPCallerPluginV2 Parameters

i: ./../HQ_data

k: arguta_key.txt

e: PstI-MspI

db: GBSV2.db

o: SNPs2.h5

eR: 0.01

d: 0

ko: true

do: true

mxTagL: 64

minPosQS: 0.0

batchSize: 8

mnQS: 0


size of all tags in tag table=154792

size of all tags in mappingApproach table=2

size of all taxa in taxa table=54

ProductionSNPCallerPluginV2: Total batches to process: 1

size of all positions in snpPosition table=60228

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 

The target HDF5 file:

  SNPs2.h5

does not exist. A new HDF5 file of that name will be created 

to hold the genotypes from this run.

size of all positions in snpPosition table=60228

size of all alleles in allele table=123036


Start processing batch 1

Enzyme: PstI-MspI

Enzyme: PstI-MspI

Enzyme: PstI-MspI

Enzyme: PstI-MspI

[ForkJoinPool.commonPool-worker-25] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /home/arthur/kiwi_GBS_project/Paper1_Bioinfo/TA/arg64_BWAmem/./../HQ_data/H0WE7ADXX_2_fastq.txt: Quality score base:33

[ForkJoinPool.commonPool-worker-18] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /home/arthur/kiwi_GBS_project/Paper1_Bioinfo/TA/arg64_BWAmem/./../HQ_data/H8DUYADXX_2_fastq.txt: Quality score base:33

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /home/arthur/kiwi_GBS_project/Paper1_Bioinfo/TA/arg64_BWAmem/./../HQ_data/H94NLADXX_1_fastq.txt: Quality score base:33

[ForkJoinPool.commonPool-worker-11] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /home/arthur/kiwi_GBS_project/Paper1_Bioinfo/TA/arg64_BWAmem/./../HQ_data/H8DUYADXX_1_fastq.txt: Quality score base:33

[ForkJoinPool.commonPool-worker-25] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 2

java.lang.StringIndexOutOfBoundsException: String index out of range: 69

at java.lang.String.substring(String.java:1951)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:289)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:266)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$88(ProductionSNPCallerPluginV2.java:207)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2$$Lambda$8/1507544660.accept(Unknown Source)

at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)

at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)

at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)

at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)

at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

[pool-1-thread-1] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 0

java.lang.StringIndexOutOfBoundsException: String index out of range: 74

at java.lang.String.substring(String.java:1951)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:289)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:266)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$88(ProductionSNPCallerPluginV2.java:207)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2$$Lambda$8/1507544660.accept(Unknown Source)

at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)

at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)

at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)

at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)

at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)

at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)

at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)

at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)

at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processData(ProductionSNPCallerPluginV2.java:206)

at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:97)

at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1481)

at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

[ForkJoinPool.commonPool-worker-18] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 0

java.lang.StringIndexOutOfBoundsException: String index out of range: 70

at java.lang.String.substring(String.java:1951)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:289)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:266)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$88(ProductionSNPCallerPluginV2.java:207)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2$$Lambda$8/1507544660.accept(Unknown Source)

at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)

at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)

at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)

at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)

at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

[ForkJoinPool.commonPool-worker-11] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 1

java.lang.StringIndexOutOfBoundsException: String index out of range: 70

at java.lang.String.substring(String.java:1951)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:289)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:266)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$88(ProductionSNPCallerPluginV2.java:207)

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2$$Lambda$8/1507544660.accept(Unknown Source)

at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)

at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)

at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)

at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)

at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)

at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1689)

at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)


Finished processing batch 1

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 

Writing ReadsPerSample log file...

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - ReadsPerSample log file: /home/arthur/kiwi_GBS_project/Paper1_Bioinfo/TA/arg64_BWAmem/arguta_ReadsPerSample.log

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 


Total number of SNPs processed with minimum quality score 0 was 282861.


[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 -    ...done


[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2: progress: 100%

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.


It's interesting that same error shows four time. This is the number of different libraries I'm used...

Running -debug option, the output are:

ReadsPerSample.log (debug output:)

FullSampleName goodBarcodedReads goodReadsMatchedToDataBase

40537C 0 0

74_46 0 0

74_9 0 0

Ananasnaya 0 0

ChangBaiMountain2 0 0

ChangBaiMountain3 0 0

ChangBaiMountain4 0 0

ChangBaiMountain5 0 0

CherryBomb 0 0

Chico 0 0

Cordifolia1 0 0

Cornell 0 0

DACT213 0 0

DACT216 0 0

DACT217 0 0


Looking in debug file (ReadsPerSample.log) the number of good barcodes and good reads matched to data base are zeros or one for all my genotypes. What I don't understand is how step1 (GBS Seq To Tag DB) works normally and it last step not ...


My command line used was:


/home/arthur/kiwi_GBS_project/analysis/TASSEL/tassel-5-standalone/run_pipeline.pl -debug -fork1 -ProductionSNPCallerPluginV2 -db GBSV2.db -e PstI-MspI -i ./../HQ_data -k arguta_key.txt -o SNPs.h5 -ko true -do true -mxTagL 64 -endPlugin -runfork1


I wondering if some one could adviser me in this issue.


Thank you very much.

Lynn Carol Johnson

unread,
Jun 26, 2015, 7:04:54 AM6/26/15
to tas...@googlegroups.com
Hi Arthur -

The error below occurs when the sequence length minus the barcode length is less than the specified max tag length (mxTagL).   You need to re-run your file with either a smaller value for the mxTagL parameter, or a higher minimum quality score.   The software has code to check this in GBSSeqToTagDBPlugin, but not in ProductionSNPCallerPluginV2.  In GBSSeqToTagDBPLugin, we stop processing and print an message.  I should add that code to Production SNP Caller.

Lynn

DACT2170 0


Looking in debug file (ReadsPerSample.log) the number of good barcodes and good reads matched to data base are zeros or one for all my genotypes. What I don't understand is how step1 (GBS Seq To Tag DB) works normally and it last step not ...


My command line used was:


/home/arthur/kiwi_GBS_project/analysis/TASSEL/tassel-5-standalone/run_pipeline.pl -debug -fork1 -ProductionSNPCallerPluginV2 -db GBSV2.db -e PstI-MspI -i ./../HQ_data -k arguta_key.txt -o SNPs.h5 -ko true -do true -mxTagL 64 -endPlugin -runfork1


I wondering if some one could adviser me in this issue.


Thank you very much.

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/c8979615-9d3c-4e02-96f3-b2d8bf940501%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arthur Melo

unread,
Jun 26, 2015, 11:38:55 AM6/26/15
to tas...@googlegroups.com, lc...@cornell.edu
Thank you very much Lynn for your answer...
It let me think about the read length distribution on dataset. Tassel require a good (normal) read size distribution? It works very well in a exponential (short (30 bp) to bigger (150 bp)) read length distribution? 
Probably what I need to do is trimming all short reads and use all reads respecting the statement: read size + barcode < mxTagL.

Thank you very very much for your quickly answers. 

Regards ...
Reply all
Reply to author
Forward
0 new messages