Hi there,
I have been working with the GBS pipeline of TASSEL3
to analyze GBS data (restriction enzyme based reduced representation
libraries).
The pipeline finished succesfully now, but in the
end result I am getting very limited SNPs (~500). I have analyzed the
same data with different procedures and I get many more SNPs, so I am
confident there are much more SNPs in the data.
Below are the
commands that I am using. I was hoping there was some filtering which I
could turn off, but I haven't found any yet.
Does anyone have any advice on what could be the problem or what kind of checks I can make to troubleshoot this?
Thanks very much in advance!
echo "COMMAND: FastqToTagCountPlugin"
perl run_gbs_pipeline.pl
-fork1 -FastqToTagCountPlugin -i $work_folder/00-fastq -k
$work_folder/00-key/key.txt -e ApeKI -o $work_folder/01-tagCounts -s
3000000 -endPlugin -runfork1
## Usage is as follows:
## -i Input directory containing FASTQ files in text or gzipped text.
## NOTE: Directory will be searched recursively and should
## be written WITHOUT a slash after its name.
##
## -k Key file listing barcodes distinguishing the samples
## -e Enzyme used to create the GBS library, if it differs from the one listed in the key file.
## -s Max good reads per lane. (Optional. Default is 200,000,000).
## -c Minimum tag count (default is 1).
## -o Output directory to contain .cnt files (one per FASTQ file, defaults to input directory).
echo "COMMAND: MergeMultipleTagCountPlugin"
perl run_gbs_pipeline.pl -fork1 -MergeMultipleTagCountPlugin -i $work_folder/01-tagCounts/ -o $work_folder/02-mergedTagCounts/myMasterGBSTags.cnt -endPlugin -runfork1
## Usage is as follows:
## -i Input directory containing .cnt files
## -o Output file name
## -c Minimum count of reads to be output (default 1)
## -t Specifies that reads should be output in FASTQ text format.
echo "COMMAND: TagCountToFastqPlugin"
perl run_gbs_pipeline.pl -fork1 -TagCountToFastqPlugin -i $work_folder/02-mergedTagCounts/myMasterGBSTags.cnt -o $work_folder/02-mergedTagCounts/myMasterGBSTags.fastq -endPlugin -runfork1
## The options for the TagCountToFastqPlugin are:
## -i Input binary tag count (*.cnt) file
## -o Output fastq file to use as input for BWA or bowtie2
## -c Minimum count of reads for a tag to be output (default: 1)
echo "COMMAND: Indexing reference for mapping with BWA MEM"
bwa index -a bwtsw $reference
echo "COMMAND: mapping with BWA MEM"
bwa aln -q 20 -t 10 -k 1 $reference $work_folder/02-mergedTagCounts/myMasterGBSTags.fastq > $work_folder/03-mapping/Mapping.sai
bwa samse $reference $work_folder/03-mapping/Mapping.sai $work_folder/02-mergedTagCounts/myMasterGBSTags.fastq > $work_folder/03-mapping/Mapping.sam
echo "COMMAND: SAMConverterPlugin"
perl run_gbs_pipeline.pl -fork1 -SAMConverterPlugin -i $work_folder/03-mapping/Mapping.sam -o $work_folder/04-topm/myMasterTags.topm -endPlugin -runfork1
## Usage is as follows:
## -i Name of input file in SAM text format (required)
## -o Name of output file (default output.topm.bin)
## -t Specifies text output format
## -l tag length in integer multiples of 32 bases (default=2)
echo "COMMAND: FastqToTBTPlugin"
perl run_gbs_pipeline.pl
-fork1 -FastqToTBTPlugin -i $work_folder/00-fastq -k
$work_folder/00-key/key.txt -e ApeKI -o $work_folder/05-tbt -y -t
$work_folder/02-mergedTagCounts/myMasterGBSTags.cnt -endPlugin -runfork1
## Usage is as follows:
## -i Input directory containing .fastq files
## -k Barcode key file
## -e Enzyme used to create the GBS library, if it differs from the one listed in the key file.
## -o Output directory
## -s Max good reads per lane. (Optional. Default is 200,000,000).
## -c Minimum taxa count within a fastq file for a tag to be output (default 1)
## -y Output to tagsByTaxaByte (tag counts per taxon from 0 to 127) instead of tagsByTaxaBit (0 or 1)
## One of either:
## -t Tag count file, OR A
## -m Physical map file containing alignments
echo "COMMAND: MergeTagsByTaxaFilesPlugin"
perl run_gbs_pipeline.pl -fork1 -MergeTagsByTaxaFilesPlugin -i $work_folder/05-tbt -o $work_folder/06-mergedTBT/myStudy.tbt.byte -s 10000000 -endPlugin -runfork1
## Usage is as follows:
## -i Input directory containing .tbt.bin or .tbt.byte files
## -o Output file name
## -s Maximum number of tags the TBT can hold while merging (default: 200000000)
## -x Merge tag counts of taxa with identical names (default: false)
## -h Call snps in output and write to HapMap file with the provided name
echo "COMMAND: TagsToSNPByAlignment"
perl run_gbs_pipeline.pl -fork1 -TagsToSNPByAlignmentPlugin -i $work_folder/06-mergedTBT/myStudy.tbt.byte -y \
-m $work_folder/04-topm/myMasterTags.topm -o $work_folder/07-hapmap/raw -s 0 -e 370646 -endPlugin -runfork1
## The available options for the TagsToSNPByAlignmentPlugin are as follows:
## -i Input .tbt file
## -y Use byte-formatted TBT file (*.tbt.byte)
## -m TagsOnPhysicalMap file containing genomic position of tags
## -mUpd Update TagsOnPhysicalMap file with allele calls, saved to specified file
## -o Output directory (default current directory)
## -mxSites Maximum number of sites (SNPs) output per chromosome (default: 200000)
## -mnF Minimum F (inbreeding coefficient) (default: -2.0 = no filter)
## -p Pedigree file containing full sample names (or expected names after merging) & expected inbreeding
## coefficient (F) for each. Only taxa with expected F >= mnF used to calculate F = 1-Ho/He.
## (default: use ALL taxa to calculate F)
## -mnMAF Minimum minor allele frequency (default: 0.01)
## -mnMAC Minimum minor allele count (default: 10)
## -mnLCov Minimum locus coverage (proportion of Taxa with a genotype) (default: 0.1)
## -errRate Average sequencing error rate per base (used to decide between heterozygous and homozygous calls) (default: 0.01)
## -ref Path to reference genome in fasta format. Ensures that a tag from the reference genome is always included
## when the tags at a locus are aligned against each other to call SNPs. The reference allele for each site
## is then provided in the output HapMap files, under the taxon name "REFERENCE_GENOME" (first taxon).
## DEFAULT: Don't use reference genome.
## -inclRare Include the rare alleles at site (3 or 4th states) (default: false)
## -inclGaps Include sites where major or minor allele is a GAP (default: false)
## -callBiSNPsWGap Include sites where the third allele is a GAP (default: false)
## -s Start chromosome
## -e End chromosome