GBSSeqToTagDBPlugin

381 views
Skip to first unread message

Polymerase Writer

unread,
Jun 14, 2016, 10:53:17 AM6/14/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Dear group.

When i run the the GBSSeqToTagDBPlugin it is not recognising the fast file.  I have tried renaming the file to flow cell_lane, and it still is not able to be read.  Any ideas?  That you.  This is the output:

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ perl ./run_pipeline.pl -debug -Xms10G -Xmx10G -fork1 -GBSSeqToTagDBPlugin -e AvaII -i /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz -db /Users/PGW/Lettuce_GBS/salice.db -k /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Keyfile_SaladinxIceberg_UK_pop.txt -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 100000000 -endPlugin -runfork1
./lib/ahocorasick-0.2.4.jar:./lib/batik-awt-util.jar:./lib/batik-css.jar:./lib/batik-dom.jar:./lib/batik-ext.jar:./lib/batik-gui-util.jar:./lib/batik-gvt.jar:./lib/batik-parser.jar:./lib/batik-svg-dom.jar:./lib/batik-svggen.jar:./lib/batik-util.jar:./lib/batik-xml.jar:./lib/biojava-alignment-4.0.0.jar:./lib/biojava-core-4.0.0.jar:./lib/biojava-phylo-4.0.0.jar:./lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:./lib/colt.jar:./lib/commons-codec-1.10.jar:./lib/commons-math3-3.4.1.jar:./lib/ejml-0.23.jar:./lib/forester.jar:./lib/geronimo-spec-activation-1.0.2-rc4.jar:./lib/guava-14.0.1.jar:./lib/htsjdk-1.138.jar:./lib/itextpdf-5.1.0.jar:./lib/javax.json-1.0.4.jar:./lib/jcommon-1.0.6.jar:./lib/jfreechart-1.0.3.jar:./lib/json-simple-1.1.1.jar:./lib/junit-4.10.jar:./lib/log4j-1.2.13.jar:./lib/mail-1.4.jar:./lib/poi-3.0.1-FINAL-20070705.jar:./lib/postgresql-9.4-1201.jdbc41.jar:./lib/slf4j-api-1.7.10.jar:./lib/slf4j-simple-1.7.10.jar:./lib/snappy-java-1.1.1.6.jar:./lib/sqlite-jdbc-3.8.5-pre1.jar:./lib/trove-3.0.3.jar:./lib/xercesImpl.jar:./lib/xml.jar:./lib/xmlParserAPIs.jar:./sTASSEL.jar
Memory Settings: -Xms10G -Xmx10G
Tassel Pipeline Arguments: -debug -fork1 -GBSSeqToTagDBPlugin -e AvaII -i /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz -db /Users/PGW/Lettuce_GBS/salice.db -k /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Keyfile_SaladinxIceberg_UK_pop.txt -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 100000000 -endPlugin -runfork1
[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.16  Date: October 15, 2015
[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 9813 MB
[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_91
[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Mac OS X
[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 8
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GBSSeqToTagDBPlugin, -e, AvaII, -i, /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz, -db, /Users/PGW/Lettuce_GBS/salice.db, -k, /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Keyfile_SaladinxIceberg_UK_pop.txt, -kmerLength, 64, -minKmerL, 20, -mnQS, 20, -mxKmerNum, 100000000, -endPlugin, -runfork1]
net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: Jun 14, 2016 14:42:49
Enzyme: AvaII
[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
GBSSeqToTagDBPlugin Parameters
i: /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz
k: /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Keyfile_SaladinxIceberg_UK_pop.txt
e: AvaII
kmerLength: 64
minKmerL: 20
c: 10
db: /Users/PGW/Lettuce_GBS/salice.db
mnQS: 20
mxKmerNum: 100000000
batchSize: 8
deleteOldData: false

[pool-1-thread-1] ERROR net.maizegenetics.plugindef.AbstractPlugin - -i: Directory: /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz doesn't exist

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 
Usage:
GBSSeqToTagDBPlugin <options>
-i <Input Directory> : Input directory containing FASTQ files in text or gzipped text.
     NOTE: Directory will be searched recursively and should
     be written WITHOUT a slash after its name. (required)
-k <Key File> : Key file listing barcodes distinguishing the samples (required)
-e <Enzyme> : Enzyme used to create the GBS library, if it differs from the one listed in the key file (required)
-kmerLength <Maximum Kmer Length> : Specified length for each kmer to process (Default: 64)
-minKmerL <Minimum Kmer Length> : Minimum kmer Length after second cut site is removed (Default: 20)
-c <Min Kmer Count> : Minimum kmer count (Default: 10)
-db <Output Database File> : Output Database File (required)
-mnQS <Minimum quality score> : Minimum quality score within the barcode and read length to be accepted (Default: 0)
-mxKmerNum <Maximum Kmer Number> : Maximum number of kmers (Default: 50000000)
-batchSize <Batch size of fastq files> : Number of flow cells being processed simultaneously (Default: 8)
-deleteOldData <true | false> : Delete existing SNP quality data from db tables (Default: false)

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ ls
Keyfile_SaladinxIceberg_UK_pop.txt run_anything.bat
TASSELTutorialData run_anything.pl
Tassel5PipelineCLI.pdf run_pipeline.bat
Undetermined_S0_L006_R1_001.fastq.gz run_pipeline.pl
cp.bat sTASSEL.jar
example_pipelines start_tassel.bat
hs_err_pid15996.log start_tassel.pl
hs_err_pid33911.log tassel3-standalone
lib
dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ 

Lynn Carol Johnson

unread,
Jun 14, 2016, 11:10:54 AM6/14/16
to tas...@googlegroups.com
The error message isn’t complaining about the format, it is saying the file doesn’t exist.  Could it be you have a typo?  Are you on a linux machine?  Try typing the line below to ensure it exists and you don’t have a typo.

> ls /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz

Lynn

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/bf596d2a-6f85-48f7-9cad-c85b00861eaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Polymerase Writer

unread,
Jun 14, 2016, 11:25:52 AM6/14/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ ls /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz

/Applications/tasseladmin-tassel-5-standalone-232aebfe4f63/Undetermined_S0_L006_R1_001.fastq.gz

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ ls

Keyfile_SaladinxIceberg_UK_pop.txt cp.bat run_anything.bat start_tassel.bat

TASSELTutorialData example_pipelines run_anything.pl start_tassel.pl

Tassel5PipelineCLI.pdf hs_err_pid15996.log run_pipeline.bat tassel3-standalone

Undetermined_S0_L006_R1_001.fasta hs_err_pid33911.log run_pipeline.pl

Undetermined_S0_L006_R1_001.fastq.gz lib sTASSEL.jar

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ 

Terry Casstevens

unread,
Jun 14, 2016, 11:32:46 AM6/14/16
to Tassel User Group
-i is expecting a directory, not a filename. maybe you need this...

-i /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63
> https://groups.google.com/d/msgid/tassel/eaaa17e5-b6a9-494f-9f40-67b95c033d0b%40googlegroups.com.

Polymerase Writer

unread,
Jun 15, 2016, 5:54:57 AM6/15/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
I have tried that Terry, thank you for the advice though.  I get the output below.  It looks like it is reading it, but no db made:

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$ perl ./run_pipeline.pl -Xms10G -Xmx10G -fork1 -GBSSeqToTagDBPlugin -e AvaII -i /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63 -db /Users/PGW/Lettuce_GBS/salice.db -k /Users/PGW/Lettuce_GBS/SalxIce_KEY2.txt -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 100000000 -endPlugin -runfork1

./lib/ahocorasick-0.2.4.jar:./lib/batik-awt-util.jar:./lib/batik-css.jar:./lib/batik-dom.jar:./lib/batik-ext.jar:./lib/batik-gui-util.jar:./lib/batik-gvt.jar:./lib/batik-parser.jar:./lib/batik-svg-dom.jar:./lib/batik-svggen.jar:./lib/batik-util.jar:./lib/batik-xml.jar:./lib/biojava-alignment-4.0.0.jar:./lib/biojava-core-4.0.0.jar:./lib/biojava-phylo-4.0.0.jar:./lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:./lib/colt.jar:./lib/commons-codec-1.10.jar:./lib/commons-math3-3.4.1.jar:./lib/ejml-0.23.jar:./lib/forester.jar:./lib/geronimo-spec-activation-1.0.2-rc4.jar:./lib/guava-14.0.1.jar:./lib/htsjdk-1.138.jar:./lib/itextpdf-5.1.0.jar:./lib/javax.json-1.0.4.jar:./lib/jcommon-1.0.6.jar:./lib/jfreechart-1.0.3.jar:./lib/json-simple-1.1.1.jar:./lib/junit-4.10.jar:./lib/log4j-1.2.13.jar:./lib/mail-1.4.jar:./lib/poi-3.0.1-FINAL-20070705.jar:./lib/postgresql-9.4-1201.jdbc41.jar:./lib/slf4j-api-1.7.10.jar:./lib/slf4j-simple-1.7.10.jar:./lib/snappy-java-1.1.1.6.jar:./lib/sqlite-jdbc-3.8.5-pre1.jar:./lib/trove-3.0.3.jar:./lib/xercesImpl.jar:./lib/xml.jar:./lib/xmlParserAPIs.jar:./sTASSEL.jar

Memory Settings: -Xms10G -Xmx10G

Tassel Pipeline Arguments: -fork1 -GBSSeqToTagDBPlugin -e AvaII -i /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63 -db /Users/PGW/Lettuce_GBS/salice.db -k /Users/PGW/Lettuce_GBS/SalxIce_KEY2.txt -kmerLength 64 -minKmerL 20 -mnQS 20 -mxKmerNum 100000000 -endPlugin -runfork1

[main] INFO net.maizegenetics.tassel.TasselLogging - Tassel Version: 5.2.16  Date: October 15, 2015

[main] INFO net.maizegenetics.tassel.TasselLogging - Max Available Memory Reported by JVM: 9813 MB

[main] INFO net.maizegenetics.tassel.TasselLogging - Java Version: 1.8.0_91

[main] INFO net.maizegenetics.tassel.TasselLogging - OS: Mac OS X

[main] INFO net.maizegenetics.tassel.TasselLogging - Number of Processors: 8

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Pipeline Arguments: [-fork1, -GBSSeqToTagDBPlugin, -e, AvaII, -i, /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63, -db, /Users/PGW/Lettuce_GBS/salice.db, -k, /Users/PGW/Lettuce_GBS/SalxIce_KEY2.txt, -kmerLength, 64, -minKmerL, 20, -mnQS, 20, -mxKmerNum, 100000000, -endPlugin, -runfork1]

net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Starting net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: Jun 15, 2016 10:52:14

Enzyme: AvaII

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - 

GBSSeqToTagDBPlugin Parameters

i: /Applications/tasseladmin-tassel-5-standalone-232aebfe4f63

k: /Users/PGW/Lettuce_GBS/SalxIce_KEY2.txt

e: AvaII

kmerLength: 64

minKmerL: 20

c: 10

db: /Users/PGW/Lettuce_GBS/salice.db

mnQS: 20

mxKmerNum: 100000000

batchSize: 8

deleteOldData: false


[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - Finished net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: Jun 15, 2016 10:52:14

[pool-1-thread-1] INFO net.maizegenetics.pipeline.TasselPipeline - net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin: time: Jun 15, 2016 10:52:14: progress: 100%

[pool-1-thread-1] INFO net.maizegenetics.plugindef.AbstractPlugin - net.maizegenetics.analysis.gbs.v2.GBSSeqToTagDBPlugin  Citation: Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635.

dh198178:tasseladmin-tassel-5-standalone-232aebfe4f63 PGW$

Lynn Carol Johnson

unread,
Jun 15, 2016, 7:02:49 AM6/15/16
to tas...@googlegroups.com
Verify your key file has flow cell and lane values that match those indicated by your file names.  The file names should contain either 3,4 or 5 underscore-delimited values, e.g. flowcell_lane_fastq.txt.gz OR flowcell_s_lane_fastq.txt.gz OR code_flowcell_s_lane_fastq.txt.gz

If you still have the problem, please with the debug option as per the link below and send me the output.


Thanks - Lynn

From: <tas...@googlegroups.com> on behalf of Polymerase Writer <polymera...@hotmail.com>
Reply-To: "tas...@googlegroups.com" <tas...@googlegroups.com>
Date: Wednesday, June 15, 2016 at 5:54 AM
To: "TASSEL - Trait Analysis by Association, Evolution and Linkage" <tas...@googlegroups.com>
Subject: [TASSEL-Group] Re: GBSSeqToTagDBPlugin

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

Marlee Labroo

unread,
Jun 23, 2016, 4:46:18 PM6/23/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage

Hi All,

I had a similar problem with the key file-- getting an error message in GBSSeqToTagDBPlugin that it doesn't exist, even though I think the file is named correctly and my code worked on a different library. I am wondering if it's related to the recent TASSEL update. I'm sure I'll figure this out eventually, but just wanted to add this here in case the two were related. I was able to see the key file with the correct extension etc. using the dir command, so it is the pipeline not recognizing it I think.

Error message:

Terry Casstevens

unread,
Jun 23, 2016, 7:06:50 PM6/23/16
to Tassel User Group
Looks like you specified the wrong directory. There should be a GBS subdirectory?

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

Lynn Carol Johnson

unread,
Jun 24, 2016, 7:06:59 AM6/24/16
to tas...@googlegroups.com
The problem with the fastQ file that is including with Marlee’s email was indeed a naming problem.  As stated in the error message that accompanied that execution, the file was named incorrectly.  It should have been named with an underscore before “fastq”, not a period.  That is, “Undetermined_S0_L006_R1_001_fastq.gz” instead of "Undetermined_S0_L006_R1_001.fastq.gz”

Marlee, please run with the debug option and send us your output.

Thanks - Lynn

Marlee Labroo

unread,
Jun 24, 2016, 9:10:57 AM6/24/16
to tas...@googlegroups.com

Thank you both for your kind help! After correcting the file name, the plugin began running but produced no output. After rigorously checking that my key file matched the fastq, I saved the key file as a tab - delimited text file instead of unicode text, and the plugin ran correctly. Maybe this is a mistake only a beginner would make.

Thanks again! -Marlee

You received this message because you are subscribed to a topic in the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tassel/Lc54JnAMeYg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tassel+un...@googlegroups.com.

To post to this group, send email to tas...@googlegroups.com.

Polymerase Writer

unread,
Jul 4, 2016, 7:11:17 AM7/4/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
It seems to be running but still no db created.
report040716.txt

Lynn Carol Johnson

unread,
Jul 4, 2016, 11:28:01 AM7/4/16
to tas...@googlegroups.com
Did you correct the file name as requested from my June 15th email?  I’ve copied that email below:

From: Lynn Carol Johnson <lc...@cornell.edu>
Date: Wednesday, June 15, 2016 at 8:45 AM
To: Polymerase Writer <polymera...@hotmail.com>
Subject: Re: Private message regarding: [TASSEL-Group] Re: GBSSeqToTagDBPlugin


The file name format is incorrect – you need an underscore, not a period, before “fastq”.

[Thread-3] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Couldn't find any files that end with ".fq", ".fq.gz", ".fastq", "_fastq.txt", "_fastq.gz", "_fastq.txt.gz", "_sequence.txt", or "_sequence.txt.gz" in the input directory: /Users/PGW/Lettuce_GBS/fastq
java.lang.IllegalArgumentException: Couldn't find any files that end with ".fq", ".fq.gz", ".fastq", "_fastq.txt", "_fastq.gz", "_fastq.txt.gz", "_sequence.txt", or "_sequence.txt.gz" in the input directory: /Users/PGW/Lettuce_GBS/fastq

The code is looking for H7LLWBBXX_6_fastq.txt.gz

dh198178:fastq PGW$ ls

H7LLWBBXX_6.fastq.txt.gz H7LLWBBXX_6_fastqc.html H7LLWBBXX_6_fastqc.zip



Polymerase Writer

unread,
Jul 26, 2016, 6:47:38 AM7/26/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,
I tried using the files described in the debug and it still did not produce a *.db.  Thank you for your help. 
debug260716

Lynn Carol Johnson

unread,
Jul 26, 2016, 7:17:43 AM7/26/16
to tas...@googlegroups.com
If you renamed the files to be correct, then check your key file to ensure it has the required headings and that the flow cells and lanes from the file names are represented in the key file.  The GBSv2 wiki page has examples of key files and explains the requirements.  Please see below:



From: <tas...@googlegroups.com> on behalf of Polymerase Writer <polymera...@hotmail.com>
Reply-To: "tas...@googlegroups.com" <tas...@googlegroups.com>
Date: Tuesday, July 26, 2016 at 6:47 AM
To: "TASSEL - Trait Analysis by Association, Evolution and Linkage" <tas...@googlegroups.com>
Subject: Re: [TASSEL-Group] Re: GBSSeqToTagDBPlugin


On Monday, July 4, 2016 at 4:28:01 PM UTC+1, Lynn Johnson wrote:
Did you correct the file name as requested from my June 15th email?  I’ve copied that email below:

From: Lynn Carol Johnson <lc...@cornell.edu>
Date: Wednesday, June 15, 2016 at 8:45 AM
To: Polymerase Writer <polymera...@hotmail.com>
Subject: Re: Private message regarding: [TASSEL-Group] Re: GBSSeqToTagDBPlugin


The file name format is incorrect – you need an underscore, not a period, before “fastq”.

[Thread-3] DEBUG net.maizegenetics.plugindef.AbstractPlugin - Couldn't find any files that end with ".fq", ".fq.gz", ".fastq", "_fastq.txt", "_fastq.gz", "_fastq.txt.gz", "_sequence.txt", or "_sequence.txt.gz" in the input directory: /Users/PGW/Lettuce_GBS/fastq
java.lang.IllegalArgumentException: Couldn't find any files that end with ".fq", ".fq.gz", ".fastq", "_fastq.txt", "_fastq.gz", "_fastq.txt.gz", "_sequence.txt", or "_sequence.txt.gz" in the input directory: /Users/PGW/Lettuce_GBS/fastq

The code is looking for H7LLWBBXX_6_fastq.txt.gz

dh198178:fastq PGW$ ls

H7LLWBBXX_6.fastq.txt.gzH7LLWBBXX_6_fastqc.html H7LLWBBXX_6_fastqc.zip

Polymerase Writer

unread,
Aug 1, 2016, 9:26:13 AM8/1/16
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Lynn,
Thank you for your help.  I have resolved the issue; it was the labelling.
Best wishes,
Pol

Lynn Carol Johnson

unread,
Aug 1, 2016, 9:42:35 AM8/1/16
to tas...@googlegroups.com
Great! - Glad it’s been fixed.

Lynn

Reply all
Reply to author
Forward
0 new messages