Hi vin69110,
The Flowcell column in your barcode key file needs to match the flowcell part of your file name, and cannot contain an underscore. So, try changing “AC1JDAACXX_1094” to “C1JDAACXX” (or “C10RAACXX”).
Best,
Jeff
--
Jeff Glaubitz
Project Manager
Genetic Architecture of Maize and Teosinte
National Science Foundation award 0820619
Institute for Genomic Diversity
Cornell University
175 Biotechnology Bldg
Ithaca, NY 14853
Phone: 607-255-1386
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tassel/74c6b360-11ee-45df-8633-e21e4d094728%40googlegroups.com?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.
Hi vin69110,
I don’t think that the memory error message is related, but you could try increasing the memory and see if the StringIndexOutOfBoundsException goes away.
Looking at where the index out of range error is being thrown in the code, I suspect that your fastq file is not being read properly for some reason. We divide each sequence read into 2 chunks of 32 bases, storing each chunk as 64 bits (two bits for each nucleotide, where 00=A, 01=C, 10=G, and 11=T). That is where the 32 in “String index out of range: 32” comes from. If it throws this error, then it is probably reading something other than the sequence read from the file, a string that is less than 32 bases long, so String.substring(0,32) fails. Your sequences are 100 bases long, so it is not because they are too short. This part of the code is trying to figure out the barcode, so it only uses the first 32 bases.
Maybe the fastq file has linux line endings and you are trying to run the pipeline on a mac (or vice versa), or perhaps it should be named *.txt.gz instead of *.txt? Try:
wc -l /usda_ars_data/Bassil_Lab/FC1094/lane2/fastq/C10RAACXX_4_fastq.txt
And see what you get. This should tell you how many lines are in the file (=4x the number of reads).
Best,
Jeff
From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of vin6...@gmail.com
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/59a0cd51-ed07-49fa-8993-0a05d388f489%40googlegroups.com?hl=en-US.
Reads with N’s in either the barcode or the subsequent 64 bases are not used by the pipeline. Barcodes must match exactly.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/f0d125b3-e2af-410e-9239-bd8aae2c2ac0%40googlegroups.com?hl=en-US.
Hi Vin,
No, reads with N do not cause the code to throw errors. It merely skips them (they are counted toward the total number of reads). They are common at the beginning and end of the fastq files (edge effect on the flowcell).
There is no script available to correct single 5’ Ns.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/daec5309-991e-4897-87ac-51cc968dfd70%40googlegroups.com?hl=en-US.