-ProductionSNPCallerPluginV2, Length Test

165 views
Skip to first unread message

Stephanie

unread,
May 10, 2018, 4:18:21 AM5/10/18
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello,

I am attempting to analyse two paired-end datasets together, both with different read lengths: one is 100bp read lengths, the other is 150bp read lengths. I've been able to successfully do this in the past by demultiplexing the reads and re-appending barcodes before running through Tassel5 GBS v2 Pipeline with kmerLength specified to 64bp. 

This time around, I wanted to trim the data before processing it with Tassel5. First, I trimmed the demultiplexed reads to remove the Illumina adapter and the cut site, and also ran a custom script to verify if any chimeras were present (see scripts here). Then, I trimmed all the reads down to 65bp using Skewer to make sure that both my data sets were the same size, to reduce batch effects. 

Finally, I interleaved my forward/reverse reads using ShuffleSequences_fasta.pl (found in Velvet), and appended new ApeKI barcodes, generated with GBSX, using a custom script

When I run these fastq files through Tassel5 GBS v2 Pipeline, I get through all plugins, except for -ProductionSNPCallerPluginV2, where I get the following output (see errors in red)

Memory Settings: -Xms512m -Xmx16G
Tassel Pipeline Arguments: -fork1 -ProductionSNPCallerPluginV2 -db /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db -e ApeKI -i /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018 -k /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded.txt -kmerLength 64 -o /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/GBS_2016_2018_9May2018.vcf -endPlugin -runfork1
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.43  Date: February 22, 2018
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 14564 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_162
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Linux
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 8
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -ProductionSNPCallerPluginV2, -db, /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db, -e, ApeKI, -i, /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018, -k, /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded.txt, -kmerLength, 64, -o, /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/GBS_2016_2018_9May2018.vcf, -endPlugin, -runfork1]
net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2: time: May 10, 2018 0:11:21


Enzyme: ApeKI
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
ProductionSNPCallerPluginV2 Parameters
i: /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018
k: /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded.txt
e: ApeKI
db: /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db
o: /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/GBS_2016_2018_9May2018.vcf
eR: 0.01
d: 0
ko: false
do: true
kmerLength: 64
minPosQS: 0.0
batchSize: 8
mnQS: 0

size of all tags in tag table=2370966
size of all tissues in tissue table=0
size of all tags in mappingApproach table=2
size of all taxa in taxa table=141
ProductionSNPCallerPluginV2: Total batches to process: 1
size of all positions in snpPosition table=96277
[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 
Output VCF file: 
/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/GBS_2016_2018_9May2018.vcf 
created for genotypes from this run.
size of all positions in snpPosition table=96277
size of all alleles in allele table=194649

Start processing batch 1
Enzyme: ApeKI
Enzyme: ApeKI
[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/HKFWCCXY_8_fastq.fq: Quality score base:33
[ForkJoinPool.commonPool-worker-1] INFO net.maizegenetics.analysis.gbs.v2.GBSUtils - /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/H2J2MBCXY_1_fastq.fq: Quality score base:33
[pool-1-thread-1] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 10
[ForkJoinPool.commonPool-worker-1] ERROR net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - Good Barcodes Read: 16
java.lang.StringIndexOutOfBoundsException: 

ERROR processing /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/HKFWCCXY_8_fastq.fq
Reading entry number 11 fails the length test.
Sequence length 63 minus barcode length 10 is less than kmerLength 64.
Re-run your files with either a shorter kmerLength value or a higher minimum quality score.

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:362)
at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:327)
at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$2(ProductionSNPCallerPluginV2.java:252)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processData(ProductionSNPCallerPluginV2.java:250)
at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:112)
at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1837)
at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
java.lang.StringIndexOutOfBoundsException: 

ERROR processing /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/H2J2MBCXY_1_fastq.fq
Reading entry number 17 fails the length test.
Sequence length 65 minus barcode length 9 is less than kmerLength 64.
Re-run your files with either a shorter kmerLength value or a higher minimum quality score.

at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQ(ProductionSNPCallerPluginV2.java:362)
at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.processFastQFile(ProductionSNPCallerPluginV2.java:327)
at net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2.lambda$processData$2(ProductionSNPCallerPluginV2.java:252)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Finished processing batch 1
[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 
Writing ReadsPerSample log file...
[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - ReadsPerSample log file: /media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded_ReadsPerSample.log
[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 - 

Total number of SNPs processed with minimum quality score 0 was 96277.

[pool-1-thread-1] INFO net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2 -    ...done

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2: time: May 10, 2018 0:11:39
[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2: time: May 10, 2018 0:11:39: progress: 100%
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.ProductionSNPCallerPluginV2  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.

What is a Length Test? When creating the original tag DB, I specified -minKmerL 20 and -kmerLength 64, but maybe there are kmers lower than 20 in my dataset anyway due to over-trimming? If anyone has thoughts/ideas/suggestions, they would be greatly appreciated. 

Cheers,

Stephanie 

PS - I have posted all the commands from this run below:

'/home/stephanie/TASSEL5/run_pipeline.pl' -Xmx16G -fork1 -GBSSeqToTagDBPlugin -e ApeKI -i '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018' -db '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db' -k '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded.txt' -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 100000000 -endPlugin -runfork1 

'/home/stephanie/TASSEL5/run_pipeline.pl' -Xmx16G -fork1 -TagExportToFastqPlugin -db '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db' -o '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.fa.gz' -c 1 -endPlugin -runfork1 

bwa aln -t4 '/media/stephanie/Olivia/Genomes/SuperScaffolds/superscaffolds_chromosome1.fasta' '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.fa.gz' > '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.sai' 

bwa samse '/media/stephanie/Olivia/Genomes/SuperScaffolds/superscaffolds_chromosome1.fasta' '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.sai' '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.fa.gz' > '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.sam' 

'/home/stephanie/TASSEL5/run_pipeline.pl' -Xmx16G -fork1 -SAMToGBSdbPlugin -i '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/tagsForAlign.sam' -db '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db' -aProp 0.0 -aLen 0 -endPlugin -runfork1 

'/home/stephanie/TASSEL5/run_pipeline.pl' -Xmx16G -fork1 -DiscoverySNPCallerPluginV2 -db '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db' -sC 1 -eC 1 -mnLCov 0.1 -mnMAF 0.05 -deleteOldData true -endPlugin -runfork1 

'/home/stephanie/TASSEL5/run_pipeline.pl' -Xmx16G -fork1 -ProductionSNPCallerPluginV2 -db '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/2016_2018_Kaki.db' -e ApeKI -i '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018' -k '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/keyfile.barcoded.txt' -kmerLength 64 -o '/media/stephanie/External4/GBS_2016_2018_Combined/9May2018/9May2018/GBS_2016_2018_9May2018.vcf' -endPlugin -runfork1 


Lynn Carol Johnson

unread,
May 10, 2018, 7:13:28 AM5/10/18
to tas...@googlegroups.com

Hi Stephanie –

 

The “kmerLength” parameter is also a minimum length parameter.  This parameter says the length of the sequence after the barcode is removed must be at least kmerLength.  You specified 64 for this length.  Per the error messages below, after subtracting the barcode, some of your sequences had lengths less than 64.  This is the length test. 

 

The minimum kmer length applies to the length of the sequence after the second site is removed.  That length you have specified as 20.  That does not show up below as a problem.

 

Lynn

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to
tas...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tassel/604979e2-98b7-4778-8b16-8f10be1428f0%40googlegroups.com.
For more options, visit
https://groups.google.com/d/optout.

Stephanie

unread,
May 10, 2018, 4:44:12 PM5/10/18
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Thank you for the kind and speedy response, Lynn! This makes sense. Since I have some tags that were trimmed quite heavily (i.e., <32bp), I think the best way around this is to filter my dataset so the minimum length is 32, then reduce my kmerLength to 32 to match.

Thank you again for the help!

Cheers,

Stephanie 

To unsubscribe from this group and stop receiving emails from it, send an email to tassel+unsubscribe@googlegroups.com.


To post to this group, send email to


To view this discussion on the web visit

Reply all
Reply to author
Forward
0 new messages