number of mismatches

45 views
Skip to first unread message

Olivier SAULNIER

unread,
Nov 26, 2015, 3:19:24 PM11/26/15
to rna-star
Hi all,

I moved to STAR v2.5.0 for PE RNA-seq data 2x101.
I have a really good % of  uniquely mapped reads.
But when i'm looking on IGV, i can see some reads with a really bad alignement (cigar 37M64S : btw i don't know why missmatches are called S ?).
How this read can be aligned, even if it mate is well aligned (cigar 101M) ?

COMMAND="${STAR} --runThreadN 8 --runMode alignReads --genomeDir ${ID} --readFilesIn ${DATA}/${SAMPLE}_1.fastq ${DATA}/${SAMPLE}_2.fastq --sjdbGTFfile /data/annotations/Galaxy/RefGene/hg19/hg19_refGene.gtf --outSAMattributes All --outFileNamePrefix ${JOBNAME} --outFilterMultimapNmax 1 --outFilterMismatchNmax 3 --outSJfilterCountUniqueMin 3 3 3 3 --outSJfilterCountTotalMin 3 3 3 3 -sjdbOverhang 100 --outSAMmultNmax 1 --outSAMtype BAM SortedByCoordinate  --outReadsUnmapped Fastx --outBAMsortingThreadN 8"

Many thanks,
Olivier S

STAR2.5.0_test1_SRR1594024Log.out

Alexander Dobin

unread,
Nov 30, 2015, 12:06:15 PM11/30/15
to rna-star
Hi Olivier,

S in the CIGAR represents "soft-clipping", i.e. bases that could not be aligned and were trimmed from the ends of the reads.
The mismatches inside the soft-clipped ends are not counted by STAR (even though IGV will show them as highly mismatched tails). If you want to control the amount of soft-clipping, please look at this post:

There are many reasons why the 64nt from this particular read could not be mapped: (i) poor sequencing quality tail; (ii)A-tail; (iii) Adapter; (iv)hard-to-detect junction; (v) chimeric/fusion junction.
To detect the (v), you can switch on chimeric detection and see whether this read will be reported as chimeric. You can also BLAST the sequence.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages