FastqToTBTPlugin error message

239 views
Skip to first unread message

Yang Zhang

unread,
Jul 14, 2014, 5:56:09 PM7/14/14
to tas...@googlegroups.com
Dear Group,

I am running latest tassel 5 native installer on ubuntu with Java 1.7.

I have problem with FastqToTBTPlugin command. Here is log I got.

run_pipeline.pl -fork1 -FastqToTBTPlugin -i fastq -k key.txt -e BamHI-MluCI -o tbt -y -t tagCounts/FCC4LKRACXX_s_2.cnt -endPlugin -runfork1


/home/hbz-heep438-ubuntu/tassel5/lib/commons-math-2.2.jar:/home/hbz-heep438-ubuntu/tassel5/lib/ejml-0.23.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-alignment-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-ext.jar:/home/hbz-heep438-ubuntu/tassel5/lib/guava-14.0.1.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gvt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/poi-3.0.1-FINAL-20070705.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jfreechart-1.0.3.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jcommon-1.0.6.jar:/home/hbz-heep438-ubuntu/tassel5/lib/sTASSEL.jar:/home/hbz-heep438-ubuntu/tassel5/lib/forester.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-phylo-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-parser.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xmlParserAPIs.jar:/home/hbz-heep438-ubuntu/tassel5/lib/LiuExt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xercesImpl.jar:/home/hbz-heep438-ubuntu/tassel5/lib/colt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-awt-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svggen.jar:/home/hbz-heep438-ubuntu/tassel5/lib/junit-4.10.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/geronimo-spec-activation-1.0.2-rc4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/log4j-1.2.13.jar:/home/hbz-heep438-ubuntu/tassel5/lib/mail-1.4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svg-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-core-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-css.jar:/home/hbz-heep438-ubuntu/tassel5/lib/itextpdf-5.1.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gui-util.jar:/home/hbz-heep438-ubuntu/tassel5/sTASSEL.jar
Memory Settings: -Xms1536m -Xmx16g
Tassel Pipeline Arguments: -fork1 -FastqToTBTPlugin -i fastq -k key.txt -e BamHI-MluCI -o tbt -y -t tagCounts/FCC4LKRACXX_s_2.cnt -endPlugin -runfork1
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Version: 5.0.8  Date: June 26, 2014
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Max Available Memory Reported by JVM: 14563 MB
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Java Version: 1.7.0_55
[main] INFO net.maizegenetics.analysis.gbs.FastqToTBTPlugin - FastqToTBTPlugin: setParameters: Using the following fastq files:
[main] INFO net.maizegenetics.analysis.gbs.FastqToTBTPlugin - /home/hbz-heep438-ubuntu/tassel5/fastq/FCC4LKRACXX_s_2_fastq.txt.gz
Reading Haplotypes distribution from:tagCounts/FCC4LKRACXX_s_2.cnt
Number of Tags in file:349183
net.maizegenetics.analysis.gbs.FastqToTBTPlugin

Working on fastq file: /home/hbz-heep438-ubuntu/tassel5/fastq/FCC4LKRACXX_s_2_fastq.txt.gz
Enzyme: BamHI-MluCI
TGACGCCA 1:FCC4LKRACXX:2:94L25-parent1
Total barcodes found in lane:1
Total barcodes found in lane:1
Catch testBasicPipeline c=1 e=java.lang.NullPointerException
___ccceegggggihihihdgbdfgfhad`eef_fefhgbfffhdcgdghfhiiff_fhhfh`dbgfdgdXZa`a_abccccc`T_b_bccccbdccccb
java.lang.NullPointerException
    at net.maizegenetics.dna.tag.AbstractTagsByTaxa.getIndexOfTaxaName(AbstractTagsByTaxa.java:54)
    at net.maizegenetics.analysis.gbs.FastqToTBTPlugin.matchTagsToTaxa(FastqToTBTPlugin.java:306)
    at net.maizegenetics.analysis.gbs.FastqToTBTPlugin.performFunction(FastqToTBTPlugin.java:54)
    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1145)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)
Timing process (writing TagsByTaxa file)...
0 tags will be output to FCC4LKRACXX_s_2.tbt.byte
Tags written to:tbt/FCC4LKRACXX_s_2.tbt.byte
Number of tags in file:0
...process (writing TagsByTaxa file) took 17 milliseconds.
Total number of reads in lane=1
Total number of good, barcoded reads=1
Finished reading 1 of 1 sequence files: /home/hbz-heep438-ubuntu/tassel5/fastq/FCC4LKRACXX_s_2_fastq.txt.gz


Plz help. Thanks.


Jeff Glaubitz

unread,
Jul 15, 2014, 12:15:32 PM7/15/14
to tas...@googlegroups.com

Hi Yang,

 

> Total barcodes found in lane:1

Is there really only one barcode (TGACGCCA) in your lane?  Please be sure that your Key file conforms to the format provided in Appendix 1, here:

http://www.maizegenetics.net/tassel/docs/TasselPipelineGBS.pdf

The names of the column headers and the order of the columns matters.  Make sure that the line endings in the key file (Mac, Windows, Linux) correspond to the operating system that you are running the pipeline on.

 

Please note that in Tassel5, the LibraryPrepID column is mandatory (column H), but can be alphanumeric (i.e., it doesn’t have to be an integer).  The LibraryPrepID is “a unique ID for every Sample/Barcode/Well combination (where Well = Row+Column) on the library prep plate (the plate on which the DNAs were combined with the barcode and common adapters). These LibraryPrepID’s are used to facilitate merging of the TagsByTaxa counts from replicate runs of the same library preps (on multiple flow cell lanes).

 

Please note also that the name of the key file should end with “_key.txt” (e.g., FCC4LKRACXX_key.txt or MyStudyName_key.txt, where you replace “MyStudyName” with an appropriate name).

 

Additional columns (past  column H) are optional, and can contain any extra information about the samples that is useful to you.

 

> java.lang.NullPointerException at net.maizegenetics.dna.tag.AbstractTagsByTaxa.getIndexOfTaxaName(AbstractTagsByTaxa.java:54)

It looks like the TagsByTaxaByte data structure has not kept up with changes that have been made in TASSEL5 (so it is currently obsolete).  Try using a TagsByTaxaHD5F instead.  In other words, instead of the FastqToTBTPlugin (and, possibly, the MergeTagsByTaxaFilesPlugin), use the SeqToTBTHDF5Plugin and the ModifyTBTHDF5Plugin instead.

 

Best,

 

Jeff

 

--

Jeff Glaubitz

Project Manager

Biology of Rare Alleles in Maize and its Wild Relatives

National Science Foundation award IOS-1238014

http://www.panzea.org

Institute for Genomic Diversity

Cornell University

175 Biotechnology Bldg

Ithaca, NY 14853

Phone: 607-255-1386

jcg...@cornell.edu

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/489c729b-dd6c-4f05-b221-df35e0a3e25f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yang Zhang

unread,
Jul 15, 2014, 1:04:36 PM7/15/14
to tas...@googlegroups.com
Wow, thank you so very much Jeff. You are so right. I forgot to check my_key.txt file format. I created the keyfile on windows but run my pipeline on Linux. My bad.

Actually, the key file was used to generate tagCounts by using FastqToTagCountPlugin at very first. And it did not give me any error message at that step. While I am running FastqToTBTPlugin and SeqToTBTHDF5Plugin, I have no output because of the format of the key file.

I do have about 200 barcodes, since I had problem in getting the output TBT I then just use one sample for troubleshooting.

Jeff, I have one more concern here. When I ran the pipeline using SeqToTBTHDF5Plugin:

run_pipeline.pl -fork1 -SeqToTBTHDF5Plugin -i fastq -k test_key.txt -e BamHI-MluCI -o tbt/tbt/test.h5 -s 900000000 -L tbt/tbt/test.log -t tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt -endPlugin -runfork1


I got the .h5 file and log file. The log file says:
File: /home/hbz-heep438-ubuntu/tassel5/fastq/SA2269-TM1-4_FCC4LKRACXX_s_5_fastq.txt.gz
Total reads: 2060073
Accepted reads (with barcode and cut site): 2055110(0.99759084 of total)
Accepted reads found in TOPM: 2055110(0.99759084 of total)
name    read count    fraction of total    mapped read count    fraction mapped of total
SA2269-TM1-4:FCC4LKRACXX:5:SA2269-TM1-4    2055110    0.99759084    2055110    1.0

It looks OK to me BUT I also got another error message: "[Thread-0] ERROR net.maizegenetics.plugindef.AbstractPlugin - null"
Is this because the -m parameter? Will this error message have any bad effects to my downstream analysis using the GBS pipeline?

Below is the screen log containing the error message.


/home/hbz-heep438-ubuntu/tassel5/lib/commons-math-2.2.jar:/home/hbz-heep438-ubuntu/tassel5/lib/ejml-0.23.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-alignment-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-ext.jar:/home/hbz-heep438-ubuntu/tassel5/lib/guava-14.0.1.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gvt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/poi-3.0.1-FINAL-20070705.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jfreechart-1.0.3.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jcommon-1.0.6.jar:/home/hbz-heep438-ubuntu/tassel5/lib/sTASSEL.jar:/home/hbz-heep438-ubuntu/tassel5/lib/forester.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-phylo-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-parser.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xmlParserAPIs.jar:/home/hbz-heep438-ubuntu/tassel5/lib/LiuExt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xercesImpl.jar:/home/hbz-heep438-ubuntu/tassel5/lib/colt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-awt-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svggen.jar:/home/hbz-heep438-ubuntu/tassel5/lib/junit-4.10.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/geronimo-spec-activation-1.0.2-rc4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/log4j-1.2.13.jar:/home/hbz-heep438-ubuntu/tassel5/lib/mail-1.4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svg-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-core-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-css.jar:/home/hbz-heep438-ubuntu/tassel5/lib/itextpdf-5.1.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gui-util.jar:/home/hbz-heep438-ubuntu/tassel5/sTASSEL.jar
Memory Settings: -Xms1536m -Xmx16g
Tassel Pipeline Arguments: -fork1 -SeqToTBTHDF5Plugin -i fastq -k test_key.txt -e BamHI-MluCI -o tbt/tbt/test.h5 -s 900000000 -L tbt/tbt/test.log -t tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt -endPlugin -runfork1

[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Version: 5.0.8  Date: June 26, 2014
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Max Available Memory Reported by JVM: 14563 MB
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Java Version: 1.7.0_55
net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin
[Thread-0] INFO net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin - FastqToTBTPlugin: setParameters: Using the following fastq files:
[Thread-0] INFO net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin - /home/hbz-heep438-ubuntu/tassel5/fastq/SA2269-TM1-4_FCC4LKRACXX_s_5_fastq.txt.gz
Reading Haplotypes distribution from:tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt
Number of Tags in file:255424
[Thread-0] INFO net.maizegenetics.plugindef.AbstractPlugin -
SeqToTBTHDF5Plugin Parameters
i: fastq
k: test_key.txt
e: BamHI-MluCI
o: tbt/tbt/test.h5
s: 900000000
L: tbt/tbt/test.log
t: tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt
m: null

Creating HDF5 file: tbt/tbt/test.h5
65536
tagChunks 4 Div 3.89746

Working on fastq file: /home/hbz-heep438-ubuntu/tassel5/fastq/SA2269-TM1-4_FCC4LKRACXX_s_5_fastq.txt.gz
Enzyme: BamHI-MluCI
TACCT SA2269-TM1-4:FCC4LKRACXX:5:SA2269-TM1-4

Total barcodes found in lane:1
Total barcodes found in lane:1
Total Reads:1000000 goodReads:998710 goodMatched:998710
Total Reads:2000000 goodReads:1995273 goodMatched:1995273

Timing process (writing TagsByTaxa file)...
[Thread-0] ERROR net.maizegenetics.plugindef.AbstractPlugin - null
[Thread-0] INFO net.maizegenetics.plugindef.AbstractPlugin -
Usage:
SeqToTBTHDF5Plugin <options>
-i <Input Directory> : Input directory containing .fastq files (required)
-k <Key File> : Barcode key file (required)
-e <Enzyme> : Enzyme used to create the GBS library, if it differs from the one listed in the key file.
-o <Output File> : Output HDF5 file (required)
-s <Max Good Reads> : Max good reads per lane. (Default: 500000000)
-L <Log File> : Output log file (required)
-t <Tag Count File> : Tag count file. (Only -t or -m allowed)
-m <Physical Map File> : Physical map file containing alignments. (Only -t or -m allowed)


Thank you so very much for your input. I really appreciate it.

Regards,
Yang



--
You received this message because you are subscribed to a topic in the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tassel/bA5fo2pUoJE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tassel+un...@googlegroups.com.

To post to this group, send email to tas...@googlegroups.com.

Terry Casstevens

unread,
Jul 15, 2014, 1:12:27 PM7/15/14
to Tassel User Group
Thank you for your well described question and information. Would you
rerun with -debug flag? Hopefully that will show additional
information

run_pipeline.pl -debug ...
> https://groups.google.com/d/msgid/tassel/CADMD0KdHmnTB64LEo-GS8tboErOMBgVYfHL-BVYqLqSOUEvq0w%40mail.gmail.com.

Yang Zhang

unread,
Jul 15, 2014, 1:24:29 PM7/15/14
to tas...@googlegroups.com
Thank you Terry. Here is the pipeline with -debug flag.

hbz-heep438-ubuntu@HPCT438-D598H02:~/tassel5$ run_pipeline.pl -debug -fork1 -SeqToTBTHDF5Plugin -i fastq -k test_key.txt -e BamHI-MluCI -o tbt/tbt/test.h5 -s 900000000 -L tbt/tbt/test.log -t tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt -endPlugin -runfork1


/home/hbz-heep438-ubuntu/tassel5/lib/commons-math-2.2.jar:/home/hbz-heep438-ubuntu/tassel5/lib/ejml-0.23.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-alignment-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-ext.jar:/home/hbz-heep438-ubuntu/tassel5/lib/guava-14.0.1.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gvt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/poi-3.0.1-FINAL-20070705.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jfreechart-1.0.3.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/jcommon-1.0.6.jar:/home/hbz-heep438-ubuntu/tassel5/lib/sTASSEL.jar:/home/hbz-heep438-ubuntu/tassel5/lib/forester.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-phylo-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-parser.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xmlParserAPIs.jar:/home/hbz-heep438-ubuntu/tassel5/lib/LiuExt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/xercesImpl.jar:/home/hbz-heep438-ubuntu/tassel5/lib/colt.jar:/home/hbz-heep438-ubuntu/tassel5/lib/cisd-jhdf5-batteries_included_lin_win_mac.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-awt-util.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svggen.jar:/home/hbz-heep438-ubuntu/tassel5/lib/junit-4.10.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-xml.jar:/home/hbz-heep438-ubuntu/tassel5/lib/geronimo-spec-activation-1.0.2-rc4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/log4j-1.2.13.jar:/home/hbz-heep438-ubuntu/tassel5/lib/mail-1.4.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-svg-dom.jar:/home/hbz-heep438-ubuntu/tassel5/lib/biojava3-core-3.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-css.jar:/home/hbz-heep438-ubuntu/tassel5/lib/itextpdf-5.1.0.jar:/home/hbz-heep438-ubuntu/tassel5/lib/batik-gui-util.jar:/home/hbz-heep438-ubuntu/tassel5/sTASSEL.jar
Memory Settings: -Xms1536m -Xmx16g
Tassel Pipeline Arguments: -debug -fork1 -SeqToTBTHDF5Plugin -i fastq -k test_key.txt -e BamHI-MluCI -o tbt/tbt/test.h5 -s 900000000 -L tbt/tbt/test.log -t tagCounts/test/SA2269-TM1-4_FCC4LKRACXX_s_5.cnt -endPlugin -runfork1
[Thread-0] DEBUG net.maizegenetics.plugindef.AbstractPlugin - null
java.nio.BufferOverflowException
    at java.nio.Buffer.nextPutIndex(Buffer.java:513)
    at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:163)
    at net.maizegenetics.dna.tag.TagsByTaxaByteHDF5TaxaGroups.encodeBySign(TagsByTaxaByteHDF5TaxaGroups.java:194)
    at net.maizegenetics.dna.tag.TagsByTaxaByteHDF5TaxaGroups.encodeBySign(TagsByTaxaByteHDF5TaxaGroups.java:180)
    at net.maizegenetics.dna.tag.TagsByTaxaByteHDF5TaxaGroups.addTaxon(TagsByTaxaByteHDF5TaxaGroups.java:130)
    at net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin.writeTBT(SeqToTBTHDF5Plugin.java:300)
    at net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin.matchTagsToTaxa(SeqToTBTHDF5Plugin.java:225)
    at net.maizegenetics.analysis.gbs.SeqToTBTHDF5Plugin.processData(SeqToTBTHDF5Plugin.java:88)
    at net.maizegenetics.plugindef.AbstractPlugin.performFunction(AbstractPlugin.java:80)

    at net.maizegenetics.plugindef.AbstractPlugin.dataSetReturned(AbstractPlugin.java:1145)
    at net.maizegenetics.plugindef.ThreadedPluginListener.run(ThreadedPluginListener.java:29)



Regards,
Yang

Reply all
Reply to author
Forward
0 new messages