Hi,
I'm using STAR2.6.0c to map SE 50bp reads against a reference genome. I would like to avoid soft clipping and to allow max 1 mismatch.
I'm using the following command:
STAR --runThreadN 20 --genomeDir /path/to/dir --readFilesIn /path/to/dir/sample.fastq.gz --readFilesCommand
zcat --outStd BAM_Unsorted --outMultimapperOrder Random --outSAMtype BAM Unsorted --outSAMunmapped None --outSAMprimaryFlag AllBestScore --outFilterMismat
chNmax 1 --alignEndsType EndToEnd
I'm expecting the nM:i to be nM:i:0 or nM:i:1 but in the output is reported a little amount of reads (~0.01%) with nM:i:2,3,4,5,6,7,8,9.
I report below an example extract from my output file:
SRR3170296.1133438 0 F21F3.6_transc 805 3 50M * 0 0 CTCAATTTTCGTAGTAATCATTCATCTCCAAAAAAAAAAAAAAGAATAAT BBBFFFFFFFFFFFFFFFIFIIIIIIIIIIIIIIIIIIFFFF77B<<B<B NH:i:2 HI:i:1 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.1_transc 1809 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:2 AS:i:33 nM:i:8
SRR3170296.4708597 0 Y55F3AM.6a.2_transc 1760 1 50M * 0 0 AAATAAGATCATACTCACTTTTTTTTTCTTTTTTTTTTTTTTGCTTTTTT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFFFFF<''0BBFFF NH:i:4 HI:i:3 AS:i:33 nM:i:8
Interestingly all these 3 alignments have a cigar of 50M..
What's happening? Should I also change other parameters?
Thanks,
Federico