Daniel,
Hopefully someone on the list can suggest a modern way to generate GBS library metrics from a FASTQ file.
As a "sanity check," you can generate GBS library metrics, independent of the TASSEL v3 pipeline, using:
buildtassel3/stats/countbarcodes.pl at main · petersm3/buildtassel3Feel free to contact me off list if you have any questions about running the script.
Thank you,
Matthew
---
grep -A4 "^# Purpose"
countbarcodes.pl# Purpose : From Illumina GBS lane(s) count the number of occurrences of:
# - barcodes with restriction sites, any length, with Ns
# - barcodes with restriction sites, remove the barcode, truncate to 64
# bases and then filter out reads with Ns (aka TASSEL 3 processing)
# Produce a CSV with the resulting values to stdout
./
countbarcodes.plUsage: ./
countbarcodes.pl -i fastq_input_dir -k key_file -e enzyme_file > countbarcodes.csv
-i, --input
Directory containing the gzipped FASTQ file(s) to be filtered
TASSEL format filenames, e.g., code_FLOWCELL_s_LANE_fastq.txt.gz
Important filename identifiers: _FLOWCELL_, _LANE_, fastq, gz
FLOWCELL and LANE will match columns 1 and 2 in the TASSEL key file
-k, --key
TASSEL key file (CSV or TSV format accepted)
-e, --enzyme
Text file containing the enzyme used to cut the GBS lane
See (TASSEL 3) TasselPipelineGBS.pdf page 8 for a list of enzymes
-h, --help
This usage information.
--
Example of file naming conventions, expected by the script:
cat enzyme.txt
PstI-MspI
head -n5 key.csv
Flowcell,Lane,Barcode,Sample,PlateName,Row,Column,LibraryPrepID,Comments
AAF327YM5,1,TGACGCCA,A1,Plate1,A,1,,A1
AAF327YM5,1,CAGATA,B1,Plate1,B,1,,B1
AAF327YM5,1,GAAGTG,C1,Plate1,C,1,,C1
AAF327YM5,1,TAGCGGAT,D1,Plate1,D,1,,D1
FASTQ filename: ALL_AAF327YM5_s_1_fastq.txt.gz