Dear Jon,
I have a question about filtering bwa output. If the dDocent code file, the mapping command for pair-end reads is below:
bwa mem reference.fasta $i.R1.fq.gz -L 20,5 -t $NUMProc -a -M -T 10 -A $optA -B $optB -O $optO -R "@RG\tID:$i\tSM:$i\tPL:Illumina" 2> bwa.$i.log | mawk '$6 !~/[2-9].[SH]/ && $6 !~ /[1-9][0-9].[SH]/' | samtools view -@$NUMProc -q 1 -SbT reference.fasta - > $i.bam 2>$i.bam.log
The command filters the output (sam files) generate by bwa based on CIGAR. It removes soft or hard clipped alignment. But this part was missing in the old version of dDocent, for example, the outdated dDocent.GATK. Its mapping command is below:
bwa mem reference.fasta $i.R1.fq $i.R2.fq -t $NUMProc -a -M -T 10 -A $optA -B $optB -O $optO -R "@RG\tID:$i\tSM:$i\tPL:Illumina" > $i.sam 2> bwa.$i.log
Therefore, I wonder why you added filtering part based on CIGAR in the new version. As far as I know, variant callers like GATK or freebayes ignore soft/hard clipped alignments, unless looking for structural variants.
Any opinions are very appreciated!
Many thanks,
Ivan